date:20191003

[jira] [Updated] (ARROW-6790) [Release] Automatically disable integration test cases in release verification

2019-10-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6790:
--
Labels: pull-request-available  (was: )

> [Release] Automatically disable integration test cases in release verification
> --
>
> Key: ARROW-6790
> URL: https://issues.apache.org/jira/browse/ARROW-6790
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
>
> If dev/release/verify-release-candidate.sh is run with selective testing and 
> includes integration tests, the selected implementations should be the only 
> ones enabled when running the integration test portion. For example:
> TEST_DEFAULT=0 \
> TEST_CPP=1 \
> TEST_JAVA=1 \
> TEST_INTEGRATION=1 \
> dev/release/verify-release-candidate.sh source 0.15.0 2
> Should run integration only for C++ and Java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6790) [Release] Automatically disable integration test cases in release verification

2019-10-03 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6790:
---

 Summary: [Release] Automatically disable integration test cases in 
release verification
 Key: ARROW-6790
 URL: https://issues.apache.org/jira/browse/ARROW-6790
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Bryan Cutler
Assignee: Bryan Cutler


If dev/release/verify-release-candidate.sh is run with selective testing and 
includes integration tests, the selected implementations should be the only 
ones enabled when running the integration test portion. For example:

TEST_DEFAULT=0 \
TEST_CPP=1 \
TEST_JAVA=1 \
TEST_INTEGRATION=1 \
dev/release/verify-release-candidate.sh source 0.15.0 2

Should run integration only for C++ and Java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6789) [Python] Automatically box bytes/buffer-like values yielded from `FlightServerBase.do_action` in Result values

2019-10-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-6789:
---

 Summary: [Python] Automatically box bytes/buffer-like values 
yielded from `FlightServerBase.do_action` in Result values
 Key: ARROW-6789
 URL: https://issues.apache.org/jira/browse/ARROW-6789
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 1.0.0


This will help with less boilerplate for server implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6736) [Rust] [DataFusion] Aggregate expressions get evaluated repeatedly

2019-10-03 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6736.
---
Resolution: Fixed

Issue resolved by pull request 5542
[https://github.com/apache/arrow/pull/5542]

> [Rust] [DataFusion] Aggregate expressions get evaluated repeatedly
> --
>
> Key: ARROW-6736
> URL: https://issues.apache.org/jira/browse/ARROW-6736
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a design flaw in the new aggregate expression traits and 
> implementations where the input to the aggregate expression gets evaluated 
> against the whole batch once for each row in the batch. For example, if the 
> batch has 1024 rows then the expression gets evaluated 1024 times instead of 
> once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-3808) [R] Implement [.arrow::Array

2019-10-03 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-3808:
---
Fix Version/s: (was: 0.15.0)
   1.0.0

> [R] Implement [.arrow::Array
> 
>
> Key: ARROW-3808
> URL: https://issues.apache.org/jira/browse/ARROW-3808
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain Francois
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-3808) [R] Implement [.arrow::Array

2019-10-03 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-3808.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5531
[https://github.com/apache/arrow/pull/5531]

> [R] Implement [.arrow::Array
> 
>
> Key: ARROW-3808
> URL: https://issues.apache.org/jira/browse/ARROW-3808
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain Francois
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6688) [Packaging] Include s3 support in the conda packages

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6688.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5484
[https://github.com/apache/arrow/pull/5484]

> [Packaging] Include s3 support in the conda packages 
> -
>
> Key: ARROW-6688
> URL: https://issues.apache.org/jira/browse/ARROW-6688
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6788) [CI] Migrate Travis CI lint job to GitHub Actions

2019-10-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-6788:
---

 Summary: [CI] Migrate Travis CI lint job to GitHub Actions
 Key: ARROW-6788
 URL: https://issues.apache.org/jira/browse/ARROW-6788
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Wes McKinney
 Fix For: 1.0.0


Depends on ARROW-5802. As far as I can tell GitHub Actions jobs run more or 
less immediately so this will give more prompt feedback to contributors



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6787) [CI] Decommission "C++ with clang 7 and system packages" Travis CI job

2019-10-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-6787:
---

 Summary: [CI] Decommission "C++ with clang 7 and system packages" 
Travis CI job
 Key: ARROW-6787
 URL: https://issues.apache.org/jira/browse/ARROW-6787
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Now that this is running in GitHub Actions, we can probably skip it in Travis 
CI?

Any other barriers to turning this off and saving the CI build time?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6634) [C++] Do not require flatbuffers or flatbuffers_ep to build

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6634.
-
Resolution: Fixed

Issue resolved by pull request 5464
[https://github.com/apache/arrow/pull/5464]

> [C++] Do not require flatbuffers or flatbuffers_ep to build
> ---
>
> Key: ARROW-6634
> URL: https://issues.apache.org/jira/browse/ARROW-6634
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Flatbuffers is small enough that we can vendor {{flatbuffers/flatbuffers.h}} 
> and check in the compiled files to make flatbuffers_ep unneeded



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6091) [Rust] [DataFusion] Implement parallel execution for limit

2019-10-03 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6091.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5509
[https://github.com/apache/arrow/pull/5509]

> [Rust] [DataFusion] Implement parallel execution for limit
> --
>
> Key: ARROW-6091
> URL: https://issues.apache.org/jira/browse/ARROW-6091
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6744) [Rust] Export JsonEqual trait in the array module

2019-10-03 Thread Paddy Horan (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6744.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5549
[https://github.com/apache/arrow/pull/5549]

> [Rust] Export JsonEqual trait in the array module
> -
>
> Key: ARROW-6744
> URL: https://issues.apache.org/jira/browse/ARROW-6744
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Kyle McCarthy
>Assignee: Kyle McCarthy
>Priority: Trivial
>  Labels: easyfix, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ARROW-5901 added checking for array equality with JSON arrays. This added the 
> JsonEqual trait bound to the Array trait but it isn't exported making it 
> private.
> The JsonEqual is a public trait, but the equal module is private and the 
> JsonEqual trait isn't exported like the ArrayEqual trait.
> AFAIK this makes it impossible to implement your own arrays that are bound by 
> the Array trait.
> I suggest that JsonEqual is exported with pub use like the ArrayEqual trait 
> from the array module. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6744) [Rust] Export JsonEqual trait in the array module

2019-10-03 Thread Paddy Horan (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-6744:
--

Assignee: Kyle McCarthy

> [Rust] Export JsonEqual trait in the array module
> -
>
> Key: ARROW-6744
> URL: https://issues.apache.org/jira/browse/ARROW-6744
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Kyle McCarthy
>Assignee: Kyle McCarthy
>Priority: Trivial
>  Labels: easyfix, pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ARROW-5901 added checking for array equality with JSON arrays. This added the 
> JsonEqual trait bound to the Array trait but it isn't exported making it 
> private.
> The JsonEqual is a public trait, but the equal module is private and the 
> JsonEqual trait isn't exported like the ArrayEqual trait.
> AFAIK this makes it impossible to implement your own arrays that are bound by 
> the Array trait.
> I suggest that JsonEqual is exported with pub use like the ArrayEqual trait 
> from the array module. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6681) [C# -> R] - Record Batches in reverse order?

2019-10-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6681:
--
Labels: pull-request-available  (was: )

> [C# -> R] - Record Batches in reverse order?
> 
>
> Key: ARROW-6681
> URL: https://issues.apache.org/jira/browse/ARROW-6681
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#, R
>Affects Versions: 0.14.1
>Reporter: Anthony Abate
>Priority: Minor
>  Labels: pull-request-available
>
> Are 'RecordBatches' being in C# being written in reverse order?
> I made a simple test which creates a single row per record batch of 0 to 99 
> and attempted to read this in R. To my surprise batch(0) in R had the value 
> 99 not 0
> This may not seem like a big deal, however when dealing with 'huge' files, 
> its more efficient to use Record Batches / index lookup than attempting to 
> load the entire file into memory.
> Having the order consistent within the different language / API seems only to 
> make sense - for now I can work around this by reversing the order before 
> writing.
>  
> https://github.com/apache/arrow/issues/5475
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6580) [Java] Support comparison for unsigned integers

2019-10-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6580:
--
Labels: pull-request-available  (was: )

> [Java] Support comparison for unsigned integers
> ---
>
> Key: ARROW-6580
> URL: https://issues.apache.org/jira/browse/ARROW-6580
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
>
> In this issue, we support the comparison of unsigned integer vectors, 
> including UInt1Vector, UInt2Vector, UInt4Vector, and UInt8Vector.
> With support for comparison for these vectors, the sort for them is also 
> supported automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-6757) [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access violation" with Visual Studio 2017

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6757.
---
Resolution: Cannot Reproduce

I was having other problems with my Miniconda -- I installed a new Miniconda, 
re-bootstrapped the dev environment, and then was not able to reproduce

> [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access 
> violation" with Visual Studio 2017
> ---
>
> Key: ARROW-6757
> URL: https://issues.apache.org/jira/browse/ARROW-6757
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>
> I encountered this when trying to verify the release with MSVC 2017. It may 
> be particular to this machine or build (though it's 100% reproducible for 
> me). I will check the Windows wheels to see if it occurs there, too
> {code}
> (C:\tmp\arrow-verify-release\conda-env) λ python
> Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 22:01:29) 
> [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow.csv as pc
> >>> pc.ParseOptions()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6757) [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access violation" with Visual Studio 2017

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6757:

Fix Version/s: (was: 1.0.0)

> [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access 
> violation" with Visual Studio 2017
> ---
>
> Key: ARROW-6757
> URL: https://issues.apache.org/jira/browse/ARROW-6757
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>
> I encountered this when trying to verify the release with MSVC 2017. It may 
> be particular to this machine or build (though it's 100% reproducible for 
> me). I will check the Windows wheels to see if it occurs there, too
> {code}
> (C:\tmp\arrow-verify-release\conda-env) λ python
> Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 22:01:29) 
> [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow.csv as pc
> >>> pc.ParseOptions()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6786) [C++] arrow-dataset-file-parquet-test is slow

2019-10-03 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-6786:
-

 Summary: [C++] arrow-dataset-file-parquet-test is slow
 Key: ARROW-6786
 URL: https://issues.apache.org/jira/browse/ARROW-6786
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


It takes 15 seconds in debug mode (probably more with ASAN /  UBSAN /etc.) to 
run 2 tests that simply iterated through a generated in-memory dataset:
{code}
$ ./build-test/debug/arrow-dataset-file-parquet-test 
Running main() from 
/home/conda/feedstock_root/build_artifacts/gtest_1551008230529/work/googletest/src/gtest_main.cc
[==] Running 2 tests from 1 test case.
[--] Global test environment set-up.
[--] 2 tests from TestParquetFileFormat
[ RUN  ] TestParquetFileFormat.ScanRecordBatchReader
[   OK ] TestParquetFileFormat.ScanRecordBatchReader (7338 ms)
[ RUN  ] TestParquetFileFormat.Inspect
[   OK ] TestParquetFileFormat.Inspect (6222 ms)
[--] 2 tests from TestParquetFileFormat (13560 ms total)

[--] Global test environment tear-down
[==] 2 tests from 1 test case ran. (13560 ms total)
[  PASSED  ] 2 tests.
{code}

Unless it is stressing something in particular, the number of repetitions or 
the batch size can probably be reduced dramatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6785) [JS] Remove superfluous child assignment

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6785.
-
Resolution: Fixed

Issue resolved by pull request 5394
[https://github.com/apache/arrow/pull/5394]

> [JS] Remove superfluous child assignment
> 
>
> Key: ARROW-6785
> URL: https://issues.apache.org/jira/browse/ARROW-6785
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Adam M Krebs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Per PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6785) [JS] Remove superfluous child assignment

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-6785:
---

Assignee: Adam M Krebs

> [JS] Remove superfluous child assignment
> 
>
> Key: ARROW-6785
> URL: https://issues.apache.org/jira/browse/ARROW-6785
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Adam M Krebs
>Priority: Major
> Fix For: 1.0.0
>
>
> Per PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6785) [JS] Remove superfluous child assignment

2019-10-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6785:
--
Labels: pull-request-available  (was: )

> [JS] Remove superfluous child assignment
> 
>
> Key: ARROW-6785
> URL: https://issues.apache.org/jira/browse/ARROW-6785
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Adam M Krebs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Per PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6785) [JS] Remove superfluous child assignment

2019-10-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-6785:
---

 Summary: [JS] Remove superfluous child assignment
 Key: ARROW-6785
 URL: https://issues.apache.org/jira/browse/ARROW-6785
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Wes McKinney
 Fix For: 1.0.0


Per PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6764) [C++] Add readahead iterator

2019-10-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6764:
--
Labels: pull-request-available  (was: )

> [C++] Add readahead iterator
> 
>
> Key: ARROW-6764
> URL: https://issues.apache.org/jira/browse/ARROW-6764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> This could replace the current ad-hoc ReadaheadSpooler, at least for JSON.
> CSV currently uses non-zero padding, but it could switch to the same strategy 
> as JSON (i.e. keep track of partial / completion blocks).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1900) [C++] Add kernel functions for determining value range (maximum and minimum) of integer arrays

2019-10-03 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943794#comment-16943794
 ] 

Neal Richardson commented on ARROW-1900:


This would have been helpful in ARROW-3808, not as an optimization but because 
I literally wanted the min and max of an integer array.

> [C++] Add kernel functions for determining value range (maximum and minimum) 
> of integer arrays
> --
>
> Key: ARROW-1900
> URL: https://issues.apache.org/jira/browse/ARROW-1900
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 1.0.0
>
>
> These functions can be useful internally for determining when a "small range" 
> alternative to a hash table can be used for integer arrays. The maximum and 
> minimum is determined in a single scan.
> We already have infrastructure for aggregate kernels, so this would be an 
> easy addition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6784) [C++][R] Move filter, take, select C++ code from Rcpp to C++ library

2019-10-03 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-6784:
--

 Summary: [C++][R] Move filter, take, select C++ code from Rcpp to 
C++ library
 Key: ARROW-6784
 URL: https://issues.apache.org/jira/browse/ARROW-6784
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-3808 and some other previous work. Of particular interest:
 * Filter and Take methods for ChunkedArray, in r/src/compute.cpp
 * Methods for that and some other things that apply Array and ChunkedArray 
methods across the columns of a RecordBatch or Table, respectively
 * RecordBatch__select and Table__select to take columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6771) [Packaging][Python] Missing pytest dependency from conda and wheel builds

2019-10-03 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6771.
-
Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5569
[https://github.com/apache/arrow/pull/5569]

> [Packaging][Python] Missing pytest dependency from conda and wheel builds
> -
>
> Key: ARROW-6771
> URL: https://issues.apache.org/jira/browse/ARROW-6771
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Multiple python packaging nightlies are failing:
> {code}
> Failed Tasks:
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py37
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py36
> - wheel-manylinux1-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux1-cp27mu
> - conda-linux-gcc-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py27
> - wheel-osx-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-osx-cp27m
> - docker-spark-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-circle-docker-spark-integration
> - wheel-win-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp35m
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py37
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py37
> - wheel-manylinux2010-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux2010-cp27mu
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py36
> - wheel-win-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp37m
> - wheel-win-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp36m
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-gandiva-jar-osx
> - conda-osx-clang-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py27
> {code}
> Because of missing, recently introduced pytest-lazy-fixture test dependency:
> {code}
> + pytest -m 'not requires_testing_data' --pyargs pyarrow
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.2.0, py-1.8.0, pluggy-0.13.0
> hypothesis profile 'default' ->
> database=DirectoryBasedExampleDatabase('$SRC_DIR/.hypothesis/examples')
> rootdir: $SRC_DIR
> plugins: hypothesis-4.38.1
> collected 1437 items / 1 errors / 3 deselected / 5 skipped / 1428 selected
>  ERRORS 
> 
> __ ERROR collecting tests/test_fs.py 
> ___
> ../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/python3.7/site-packages/pyarrow/tests/test_fs.py:91:
> in 
> pytest.lazy_fixture('localfs'),
> E AttributeError: module 'pytest' has no attribute 'lazy_fixture'
> === warnings summary 
> ===
> $PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324
> $PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324:
> PytestUnknownMarkWarning: Unknown pytest.mark.s3 - is this a typo? You
> can register custom marks to avoid this warning - for details, see
> https://docs.pytest.org/en/latest/mark.html
> PytestUnknownMarkWarning,
> -- Docs: https://docs.pytest.org/en/latest/warnings.html
> !!! Interrupted: 1 errors during collection 
>

[jira] [Updated] (ARROW-6764) [C++] Add readahead iterator

2019-10-03 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6764:
--
Description: 
This could replace the current ad-hoc ReadaheadSpooler, at least for JSON.
CSV currently uses non-zero padding, but it could switch to the same strategy 
as JSON (i.e. keep track of partial / completion blocks).


  was:
The current implementation is very ad-hoc and allows unused padding arguments.

We could refactor it using the Iterator facility.


> [C++] Add readahead iterator
> 
>
> Key: ARROW-6764
> URL: https://issues.apache.org/jira/browse/ARROW-6764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> This could replace the current ad-hoc ReadaheadSpooler, at least for JSON.
> CSV currently uses non-zero padding, but it could switch to the same strategy 
> as JSON (i.e. keep track of partial / completion blocks).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6494) [C++][Dataset] Implement basic PartitionScheme

2019-10-03 Thread Ben Kietzman (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-6494.
-
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5443
[https://github.com/apache/arrow/pull/5443]

> [C++][Dataset] Implement basic PartitionScheme
> --
>
> Key: ARROW-6494
> URL: https://issues.apache.org/jira/browse/ARROW-6494
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The PartitionScheme interface parses paths and yields the partition 
> expressions which are encoded in those paths. For example, the Hive partition 
> scheme would yield {{"a"_ = 2 and "b"_ = 3}} from "a=2/b=3/*.parquet".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6764) [C++] Add readahead iterator

2019-10-03 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6764:
--
Summary: [C++] Add readahead iterator  (was: [C++] Simplify readahead 
implementation)

> [C++] Add readahead iterator
> 
>
> Key: ARROW-6764
> URL: https://issues.apache.org/jira/browse/ARROW-6764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> The current implementation is very ad-hoc and allows unused padding arguments.
> We could refactor it using the Iterator facility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-5611) [C++] Improve clang-tidy speed

2019-10-03 Thread Francois Saint-Jacques (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943757#comment-16943757
 ] 

Francois Saint-Jacques commented on ARROW-5611:
---

One major win would be to scope only to modified files (in the current branch) 
instead of the whole directory, like iwyu wrapper is doing.

> [C++] Improve clang-tidy speed
> --
>
> Key: ARROW-5611
> URL: https://issues.apache.org/jira/browse/ARROW-5611
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Fix For: 1.0.0
>
>
> See https://github.com/apache/arrow/pull/4293#issuecomment-501950675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6774) [Rust] Reading parquet file is slow

2019-10-03 Thread Adam Lippai (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943755#comment-16943755
 ] 

Adam Lippai commented on ARROW-6774:


I've seen some nice work in 
[https://github.com/apache/arrow/blob/master/rust/parquet/src/column/reader.rs] 
and 
[https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs]
 but I couldn't figure it out how to use it. [~liurenjie1024] can you help me 
perhaps? 

> [Rust] Reading parquet file is slow
> ---
>
> Key: ARROW-6774
> URL: https://issues.apache.org/jira/browse/ARROW-6774
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Adam Lippai
>Priority: Major
>
> Using the example at 
> [https://github.com/apache/arrow/tree/master/rust/parquet] is slow.
> The snippet 
> {code:none}
> let reader = SerializedFileReader::new(file).unwrap();
> let mut iter = reader.get_row_iter(None).unwrap();
> let start = Instant::now();
> while let Some(record) = iter.next() {}
> let duration = start.elapsed();
> println!("{:?}", duration);
> {code}
> Runs for 17sec for a ~160MB parquet file.
> If there is a more effective way to load a parquet file, it would be nice to 
> add it to the readme.
> P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6783) [C++] Provide API for reconstruction of RecordBatch from Flatbuffer containing process memory addresses instead of relative offsets into an IPC message

2019-10-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-6783:
---

 Summary: [C++] Provide API for reconstruction of RecordBatch from 
Flatbuffer containing process memory addresses instead of relative offsets into 
an IPC message
 Key: ARROW-6783
 URL: https://issues.apache.org/jira/browse/ARROW-6783
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


A lot of our development has focused on _inter_process communication rather 
than _in_process. We should start by making sure we have disassembly and 
reassembly implemented where the Buffer Flatbuffers values contain process 
memory addresses rather than offsets. This may require a bit of refactoring so 
we can use the same reassembly code path for both use cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6782) [C++] Build minimal core Arrow libraries without any Boost headers

2019-10-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-6782:
---

 Summary: [C++] Build minimal core Arrow libraries without any 
Boost headers
 Key: ARROW-6782
 URL: https://issues.apache.org/jira/browse/ARROW-6782
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


We have a couple of places where these are used. It would be good to be able to 
build without any Boost headers available



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6613) [C++] Remove dependency on boost::filesystem

2019-10-03 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6613.
---
Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5545
[https://github.com/apache/arrow/pull/5545]

> [C++] Remove dependency on boost::filesystem
> 
>
> Key: ARROW-6613
> URL: https://issues.apache.org/jira/browse/ARROW-6613
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> See ARROW-2196 for details.
> boost::filesystem should not be required for base functionality at least 
> (including filesystems, probably).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6774) [Rust] Reading parquet file is slow

2019-10-03 Thread Wes McKinney (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943703#comment-16943703
 ] 

Wes McKinney commented on ARROW-6774:
-

Row-by-row iteration is going to be slow compared with vectorized / 
column-by-column reads. This unfinished PR was related to this (I think?) but 
there are Arrow-based readers available that don't require it

https://github.com/apache/arrow/pull/3461

> [Rust] Reading parquet file is slow
> ---
>
> Key: ARROW-6774
> URL: https://issues.apache.org/jira/browse/ARROW-6774
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Adam Lippai
>Priority: Major
>
> Using the example at 
> [https://github.com/apache/arrow/tree/master/rust/parquet] is slow.
> The snippet 
> {code:none}
> let reader = SerializedFileReader::new(file).unwrap();
> let mut iter = reader.get_row_iter(None).unwrap();
> let start = Instant::now();
> while let Some(record) = iter.next() {}
> let duration = start.elapsed();
> println!("{:?}", duration);
> {code}
> Runs for 17sec for a ~160MB parquet file.
> If there is a more effective way to load a parquet file, it would be nice to 
> add it to the readme.
> P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

2019-10-03 Thread Joris Van den Bossche (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943671#comment-16943671
 ] 

Joris Van den Bossche commented on ARROW-2428:
--

More thoughts on this?

I implemented a POC for case 1 described above in 
https://github.com/apache/arrow/pull/5512

This allows to roundtrip pandas ExtensionArrays, assuming the the 
pandas.ExtensionDtype implements a {{\_\_from_arrow\_\_}} to convert an Arrow 
array into a pandas ExtensionArray of that dtype (so it can be put in the 
resulting DataFrame as an extension array).

It doesn't yet handle to other cases described above, though.


> [Python] Add API to map Arrow types (including extension types) to pandas 
> ExtensionArray instances for to_pandas conversions
> 
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6781) [C++] Improve and consolidate ARROW_CHECK, DCHECK macros

2019-10-03 Thread Ben Kietzman (Jira)

Ben Kietzman created ARROW-6781:
---

 Summary: [C++] Improve and consolidate ARROW_CHECK, DCHECK macros
 Key: ARROW-6781
 URL: https://issues.apache.org/jira/browse/ARROW-6781
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ben Kietzman
Assignee: Ben Kietzman


Currently we have multiple macros like {{DCHECK_EQ}} and {{DCHECK_LT}} which 
check various comparisons but don't report anything about their operands. 
Furthermore, the "stream to assertion" pattern for appending extra info has 
proven fragile. I propose a new unified macro which can capture operands of 
comparisons and report them:

{code:cpp}
  int three = 3;
  int five = 5;
  DCHECK(three == five, "extra: ", 1, 2, five);
{code}

Results in check failure messages like:
{code}
F1003 11:12:46.174767  4166 logging_test.cc:141]  Check failed: three == five
  LHS: 3
  RHS: 5
extra: 125
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-03 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6762.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5564
[https://github.com/apache/arrow/pull/5564]

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: json, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6764) [C++] Simplify readahead implementation

2019-10-03 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-6764:
-

Assignee: Antoine Pitrou

> [C++] Simplify readahead implementation
> ---
>
> Key: ARROW-6764
> URL: https://issues.apache.org/jira/browse/ARROW-6764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> The current implementation is very ad-hoc and allows unused padding arguments.
> We could refactor it using the Iterator facility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6780) [C++][Parquet] Support DurationType in writing/reading parquet

2019-10-03 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-6780:


 Summary: [C++][Parquet] Support DurationType in writing/reading 
parquet
 Key: ARROW-6780
 URL: https://issues.apache.org/jira/browse/ARROW-6780
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Joris Van den Bossche


Currently this is not supported:

{code}
In [37]: table = pa.table({'a': pa.array([1, 2], pa.duration('s'))}) 

In [39]: table
Out[39]: 
pyarrow.Table
a: duration[s]

In [41]: pq.write_table(table, 'test_duration.parquet')
...
ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema 
conversion: duration[s]
{code}

There is no direct mapping to Parquet logical types. There is an INTERVAL type, 
but this more matches Arrow's  ( YEAR_MONTH or DAY_TIME) interval type. 

But, those duration values could be stored as just integers, and based on the 
serialized arrow schema, it could be restored when reading back in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6779) [Python] Conversion from datetime.datetime to timstamp('ns') can overflow

2019-10-03 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-6779:


 Summary: [Python] Conversion from datetime.datetime to 
timstamp('ns') can overflow
 Key: ARROW-6779
 URL: https://issues.apache.org/jira/browse/ARROW-6779
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


In the python conversion of datetime scalars, there is no check for integer 
overflow:

{code}
In [32]: pa.array([datetime.datetime(3000, 1, 1)], pa.timestamp('ns'))  

   
Out[32]: 

[
  1830-11-23 00:50:52.580896768
]
{code}

So in case the target type has nanosecond unit, this can give wrong results (I 
don't think the other resolutions can reach overflow, given the limited range 
of years of datetime.datetime).

We should probably check for this case and raise an error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6778) [C++] Support DurationType in Cast kernel

2019-10-03 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-6778:
-
Description: 
Currently, duration is not yet supported in basic cast operations (using the 
python binding from ARROW-5855, currently from my branch, not yet merged):

{code}
In [25]: arr = pa.array([1, 2])

In [26]: arr.cast(pa.duration('s'))  
...
ArrowNotImplementedError: No cast implemented from int64 to duration[s]

In [27]: arr = pa.array([1, 2], pa.duration('s'))  

In [28]: arr.cast(pa.duration('ms'))
...
ArrowNotImplementedError: No cast implemented from duration[s] to duration[ms]
{code}


> [C++] Support DurationType in Cast kernel
> -
>
> Key: ARROW-6778
> URL: https://issues.apache.org/jira/browse/ARROW-6778
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Priority: Major
>
> Currently, duration is not yet supported in basic cast operations (using the 
> python binding from ARROW-5855, currently from my branch, not yet merged):
> {code}
> In [25]: arr = pa.array([1, 2])
> In [26]: arr.cast(pa.duration('s'))  
> ...
> ArrowNotImplementedError: No cast implemented from int64 to duration[s]
> In [27]: arr = pa.array([1, 2], pa.duration('s'))  
> In [28]: arr.cast(pa.duration('ms'))
> ...
> ArrowNotImplementedError: No cast implemented from duration[s] to duration[ms]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6778) [C++] Support DurationType in Cast kernel

2019-10-03 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-6778:


 Summary: [C++] Support DurationType in Cast kernel
 Key: ARROW-6778
 URL: https://issues.apache.org/jira/browse/ARROW-6778
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Joris Van den Bossche






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice

2019-10-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6773:
--
Labels: pull-request-available  (was: )

> [C++] Filter kernel returns invalid data when filtering with an Array slice
> ---
>
> Key: ARROW-6773
> URL: https://issues.apache.org/jira/browse/ARROW-6773
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> See ARROW-3808. This failing test reproduces the issue:
> {code:java}
> --- a/cpp/src/arrow/compute/kernels/filter_test.cc
> +++ b/cpp/src/arrow/compute/kernels/filter_test.cc
> @@ -151,6 +151,12 @@ TYPED_TEST(TestFilterKernelWithNumeric, FilterNumeric) {
>this->AssertFilter("[7, 8, 9]", "[null, 1, 0]", "[null, 8]");
>this->AssertFilter("[7, 8, 9]", "[1, null, 1]", "[7, null, 9]");
>  
> +  this->AssertFilterArrays(
> +ArrayFromJSON(this->type_singleton(), "[7, 8, 9]"),
> +ArrayFromJSON(boolean(), "[0, 1, 1, 1, 0, 1]")->Slice(3, 3),
> +ArrayFromJSON(this->type_singleton(), "[7, 9]")
> +  );
> +
> {code}
> {code:java}
> arrow/cpp/src/arrow/testing/gtest_util.cc:82: Failure
> Failed
> @@ -2, +2 @@
> +0
> [  FAILED  ] TestFilterKernelWithNumeric/9.FilterNumeric, where TypeParam = 
> arrow::DoubleType (0 ms)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice

2019-10-03 Thread Ben Kietzman (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-6773.
-
Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5570
[https://github.com/apache/arrow/pull/5570]

> [C++] Filter kernel returns invalid data when filtering with an Array slice
> ---
>
> Key: ARROW-6773
> URL: https://issues.apache.org/jira/browse/ARROW-6773
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See ARROW-3808. This failing test reproduces the issue:
> {code:java}
> --- a/cpp/src/arrow/compute/kernels/filter_test.cc
> +++ b/cpp/src/arrow/compute/kernels/filter_test.cc
> @@ -151,6 +151,12 @@ TYPED_TEST(TestFilterKernelWithNumeric, FilterNumeric) {
>this->AssertFilter("[7, 8, 9]", "[null, 1, 0]", "[null, 8]");
>this->AssertFilter("[7, 8, 9]", "[1, null, 1]", "[7, null, 9]");
>  
> +  this->AssertFilterArrays(
> +ArrayFromJSON(this->type_singleton(), "[7, 8, 9]"),
> +ArrayFromJSON(boolean(), "[0, 1, 1, 1, 0, 1]")->Slice(3, 3),
> +ArrayFromJSON(this->type_singleton(), "[7, 9]")
> +  );
> +
> {code}
> {code:java}
> arrow/cpp/src/arrow/testing/gtest_util.cc:82: Failure
> Failed
> @@ -2, +2 @@
> +0
> [  FAILED  ] TestFilterKernelWithNumeric/9.FilterNumeric, where TypeParam = 
> arrow::DoubleType (0 ms)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6770) [CI][Travis] Download Minio quietly

2019-10-03 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6770.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5568
[https://github.com/apache/arrow/pull/5568]

> [CI][Travis] Download Minio quietly
> ---
>
> Key: ARROW-6770
> URL: https://issues.apache.org/jira/browse/ARROW-6770
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> To remove verbose output 
> https://travis-ci.org/pitrou/arrow/jobs/592577525#L191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6686) [CI] Pull and push docker images to speed up the nightly builds

2019-10-03 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6686.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5485
[https://github.com/apache/arrow/pull/5485]

> [CI] Pull and push docker images to speed up the nightly builds 
> 
>
> Key: ARROW-6686
> URL: https://issues.apache.org/jira/browse/ARROW-6686
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: CI
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse

2019-10-03 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-6767:
--

Assignee: Taylor Baldwin

> [JS] lazily bind batches in scan/scanReverse
> 
>
> Key: ARROW-6767
> URL: https://issues.apache.org/jira/browse/ARROW-6767
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Taylor Baldwin
>Assignee: Taylor Baldwin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Call {{bind(batch)}} lazily in {{scan}} and {{scanReverse}}, that is, only 
> when the predicate has matched a record in a batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse

2019-10-03 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6767.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5565
[https://github.com/apache/arrow/pull/5565]

> [JS] lazily bind batches in scan/scanReverse
> 
>
> Key: ARROW-6767
> URL: https://issues.apache.org/jira/browse/ARROW-6767
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Taylor Baldwin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Call {{bind(batch)}} lazily in {{scan}} and {{scanReverse}}, that is, only 
> when the predicate has matched a record in a batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

49 matches

Mail list logo