[jira] [Commented] (ARROW-5539) [Java] Test failure

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860706#comment-16860706
 ] 

Antoine Pitrou commented on ARROW-5539:
---

You're right, that worked. Thank you!

> [Java] Test failure
> ---
>
> Key: ARROW-5539
> URL: https://issues.apache.org/jira/browse/ARROW-5539
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Antoine Pitrou
>Priority: Major
>
> I know next to nothing about Java ecosystems. I'm trying to build and test 
> locally, and get the following failures:
> {code}
> [ERROR] Tests run: 6, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.011 
> s <<< FAILURE! - in io.netty.buffer.TestArrowBuf
> [ERROR] testSetBytesSliced(io.netty.buffer.TestArrowBuf)  Time elapsed: 0.004 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> io.netty.buffer.ArrowBuf.setBytes(ILjava/nio/ByteBuffer;II)Lio/netty/buffer/ArrowBuf;
>   at 
> io.netty.buffer.TestArrowBuf.testSetBytesSliced(TestArrowBuf.java:100)
> [ERROR] testSetBytesUnsliced(io.netty.buffer.TestArrowBuf)  Time elapsed: 0 s 
>  <<< ERROR!
> java.lang.NoSuchMethodError: 
> io.netty.buffer.ArrowBuf.setBytes(ILjava/nio/ByteBuffer;II)Lio/netty/buffer/ArrowBuf;
>   at 
> io.netty.buffer.TestArrowBuf.testSetBytesUnsliced(TestArrowBuf.java:121)
> 12:27:49.541 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "drill.enable_unsafe_memory_access" has been renamed to 
> "arrow.enable_unsafe_memory_access"
> 12:27:49.543 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "arrow.enable_unsafe_memory_access" can be set to:  true (to not check) or 
> false (to check, default)
> 12:27:49.617 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "drill.enable_unsafe_memory_access" has been renamed to 
> "arrow.enable_unsafe_memory_access"
> 12:27:49.619 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "arrow.enable_unsafe_memory_access" can be set to:  true (to not check) or 
> false (to check, default)
> {code}
> Java version is the following:
> {code}
> $ java -version
> java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
> {code}
> I'm on Ubuntu 18.04. Perhaps I need another JVM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5539) [Java] Test failure

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-5539.
-
Resolution: Workaround

> [Java] Test failure
> ---
>
> Key: ARROW-5539
> URL: https://issues.apache.org/jira/browse/ARROW-5539
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Antoine Pitrou
>Priority: Major
>
> I know next to nothing about Java ecosystems. I'm trying to build and test 
> locally, and get the following failures:
> {code}
> [ERROR] Tests run: 6, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.011 
> s <<< FAILURE! - in io.netty.buffer.TestArrowBuf
> [ERROR] testSetBytesSliced(io.netty.buffer.TestArrowBuf)  Time elapsed: 0.004 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> io.netty.buffer.ArrowBuf.setBytes(ILjava/nio/ByteBuffer;II)Lio/netty/buffer/ArrowBuf;
>   at 
> io.netty.buffer.TestArrowBuf.testSetBytesSliced(TestArrowBuf.java:100)
> [ERROR] testSetBytesUnsliced(io.netty.buffer.TestArrowBuf)  Time elapsed: 0 s 
>  <<< ERROR!
> java.lang.NoSuchMethodError: 
> io.netty.buffer.ArrowBuf.setBytes(ILjava/nio/ByteBuffer;II)Lio/netty/buffer/ArrowBuf;
>   at 
> io.netty.buffer.TestArrowBuf.testSetBytesUnsliced(TestArrowBuf.java:121)
> 12:27:49.541 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "drill.enable_unsafe_memory_access" has been renamed to 
> "arrow.enable_unsafe_memory_access"
> 12:27:49.543 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "arrow.enable_unsafe_memory_access" can be set to:  true (to not check) or 
> false (to check, default)
> 12:27:49.617 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "drill.enable_unsafe_memory_access" has been renamed to 
> "arrow.enable_unsafe_memory_access"
> 12:27:49.619 [main] WARN  o.apache.arrow.memory.BoundsChecking - 
> "arrow.enable_unsafe_memory_access" can be set to:  true (to not check) or 
> false (to check, default)
> {code}
> Java version is the following:
> {code}
> $ java -version
> java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
> {code}
> I'm on Ubuntu 18.04. Perhaps I need another JVM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5509) [R] write_parquet()

2019-06-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François reassigned ARROW-5509:
--

Assignee: Romain François  (was: Uwe L. Korn)

> [R] write_parquet()
> ---
>
> Key: ARROW-5509
> URL: https://issues.apache.org/jira/browse/ARROW-5509
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We can read but not yet write. The C++ library supports this and pyarrow does 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5504) [R] move use_threads argument to global option

2019-06-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François reassigned ARROW-5504:
--

Assignee: Romain François

> [R] move use_threads argument to global option
> --
>
> Key: ARROW-5504
> URL: https://issues.apache.org/jira/browse/ARROW-5504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Minor
> Fix For: 0.14.0
>
>
> Why wouldn't you want to use the multithreaded API for reading data from 
> arrow into R? We shouldn't clutter our function signatures with options that 
> people won't use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5509) [R] write_parquet()

2019-06-11 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860771#comment-16860771
 ] 

Uwe L. Korn commented on ARROW-5509:


[~romainfrancois] see my PR, I'm already working on this and will continue 
today.

> [R] write_parquet()
> ---
>
> Key: ARROW-5509
> URL: https://issues.apache.org/jira/browse/ARROW-5509
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We can read but not yet write. The C++ library supports this and pyarrow does 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5547) [C++][FlightRPC] arrow-flight.pc isn't provided

2019-06-11 Thread Yosuke Shiro (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yosuke Shiro reassigned ARROW-5547:
---

Assignee: Yosuke Shiro

> [C++][FlightRPC] arrow-flight.pc isn't provided
> ---
>
> Key: ARROW-5547
> URL: https://issues.apache.org/jira/browse/ARROW-5547
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Sutou Kouhei
>Assignee: Yosuke Shiro
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5551) [Go] invalid FixedSizeArray representation

2019-06-11 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5551:
--

 Summary: [Go] invalid FixedSizeArray representation
 Key: ARROW-5551
 URL: https://issues.apache.org/jira/browse/ARROW-5551
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


FixedSizeArrays are currently represented as 3-buffers data.

but the C++ definition expects a 2-buffers data layout (as all the primitive 
arrays.)

(uncovered while trying to roundtrip all "integration" tests.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5552) Go: make Schema and Field implement Stringer

2019-06-11 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5552:
--

 Summary: Go: make Schema and Field implement Stringer
 Key: ARROW-5552
 URL: https://issues.apache.org/jira/browse/ARROW-5552
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5504) [R] move use_threads argument to global option

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5504:
--
Labels: pull-request-available  (was: )

> [R] move use_threads argument to global option
> --
>
> Key: ARROW-5504
> URL: https://issues.apache.org/jira/browse/ARROW-5504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Why wouldn't you want to use the multithreaded API for reading data from 
> arrow into R? We shouldn't clutter our function signatures with options that 
> people won't use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5552) [Go] make Schema and Field implement Stringer

2019-06-11 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet updated ARROW-5552:
---
Summary: [Go] make Schema and Field implement Stringer  (was: Go: make 
Schema and Field implement Stringer)

> [Go] make Schema and Field implement Stringer
> -
>
> Key: ARROW-5552
> URL: https://issues.apache.org/jira/browse/ARROW-5552
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5544) [Archery] should not return non-zero in `benchmark diff` sub command on regression

2019-06-11 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-5544.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4512
[https://github.com/apache/arrow/pull/4512]

> [Archery] should not return non-zero in `benchmark diff` sub command on 
> regression
> --
>
> Key: ARROW-5544
> URL: https://issues.apache.org/jira/browse/ARROW-5544
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a regression is detected, but the command ran successfully, it should 
> return zero. Currently it returns the number of regression. This is to play 
> better with ursabot. It should be left to the user to decide what to do with 
> the json data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5503) [R] add read_json()

2019-06-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François reassigned ARROW-5503:
--

Assignee: Romain François

> [R] add read_json()
> ---
>
> Key: ARROW-5503
> URL: https://issues.apache.org/jira/browse/ARROW-5503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
> Fix For: 0.14.0
>
>
> The C++ library gained a JSON file reader last month, and pyarrow already has 
> bindings for it. R should have it too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5552) [Go] make Schema and Field implement Stringer

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5552:
--
Labels: pull-request-available  (was: )

> [Go] make Schema and Field implement Stringer
> -
>
> Key: ARROW-5552
> URL: https://issues.apache.org/jira/browse/ARROW-5552
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5552) [Go] make Schema and Field implement Stringer

2019-06-11 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5552.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4516
[https://github.com/apache/arrow/pull/4516]

> [Go] make Schema and Field implement Stringer
> -
>
> Key: ARROW-5552
> URL: https://issues.apache.org/jira/browse/ARROW-5552
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5551) [Go] invalid FixedSizeArray representation

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5551:
--
Labels: pull-request-available  (was: )

> [Go] invalid FixedSizeArray representation
> --
>
> Key: ARROW-5551
> URL: https://issues.apache.org/jira/browse/ARROW-5551
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>
> FixedSizeArrays are currently represented as 3-buffers data.
> but the C++ definition expects a 2-buffers data layout (as all the primitive 
> arrays.)
> (uncovered while trying to roundtrip all "integration" tests.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5503) [R] add read_json()

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5503:
--
Labels: pull-request-available  (was: )

> [R] add read_json()
> ---
>
> Key: ARROW-5503
> URL: https://issues.apache.org/jira/browse/ARROW-5503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> The C++ library gained a JSON file reader last month, and pyarrow already has 
> bindings for it. R should have it too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5502) [R] file readers should mmap

2019-06-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861084#comment-16861084
 ] 

Romain François commented on ARROW-5502:


You can memory map right now, although at this point data is being copied to R 
vectors rather than borrowed from the memory mapped file, we'll need to use 
ALTREP to go further. 

 

The file argument of most reading functions may be an instance of 
arrow::io::MemoryMappedFile, which you get by using the mmap_open() function in 
R: 
{code}
library(arrow, warn.conflicts = FALSE)
library(tibble)
tf <- tempfile()
write.csv(iris, tf, row.names = FALSE, quote = FALSE)
f <- mmap_open(tf)
f
#> arrow::io::MemoryMappedFile
tab <- read_csv_arrow(f)
as_tibble(tab)
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>  
#> 1 5.1 3.5 1.4 0.2 setosa 
#> 2 4.9 3 1.4 0.2 setosa 
#> 3 4.7 3.2 1.3 0.2 setosa 
#> 4 4.6 3.1 1.5 0.2 setosa 
#> 5 5 3.6 1.4 0.2 setosa 
#> 6 5.4 3.9 1.7 0.4 setosa 
#> 7 4.6 3.4 1.4 0.3 setosa 
#> 8 5 3.4 1.5 0.2 setosa 
#> 9 4.4 2.9 1.4 0.2 setosa 
#> 10 4.9 3.1 1.5 0.1 setosa 
#> # … with 140 more rows
{code}
Created on 2019-06-11 by the [reprex package|https://reprex.tidyverse.org/] 
(v0.3.0.9000)

> [R] file readers should mmap
> 
>
> Key: ARROW-5502
> URL: https://issues.apache.org/jira/browse/ARROW-5502
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> Arrow is supposed to let you work with datasets bigger than memory. Memory 
> mapping is a big part of that. It should be the default way that files are 
> read in the `read_*` functions. To disable memory mapping, we could use a 
> global `option()`, or a function argument, but that might clutter the 
> interface. Or we could not give a choice and only fall back to not memory 
> mapping if the platform/file system doesn't support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5476) [Java][Memory] Fix Netty ArrowBuf Slice

2019-06-11 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra resolved ARROW-5476.
---
Resolution: Fixed

Issue resolved by pull request 4451
[https://github.com/apache/arrow/pull/4451]

> [Java][Memory] Fix Netty ArrowBuf Slice
> ---
>
> Key: ARROW-5476
> URL: https://issues.apache.org/jira/browse/ARROW-5476
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Affects Versions: 0.14.0
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The slice of netty arrow buf depends on arrow buf reader and writer indexes, 
> but arrow buf is supposed to only track memory addr + length and there are 
> places where the arrow buf indexes are not in sync with netty.
> So slice should use the indexes in Netty Arrow Buf instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5502) [R] file readers should mmap

2019-06-11 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861100#comment-16861100
 ] 

Neal Richardson commented on ARROW-5502:


Memory mapping would make the loading in memory to copy to R lazy, and will be 
necessary for things like {{read_parquet(f, col_select)}} to not read all 
columns into Arrow before copying to R.

Yes, I believed it was possible now, but that's not a friendly enough interface 
for package users, IMO. 

> [R] file readers should mmap
> 
>
> Key: ARROW-5502
> URL: https://issues.apache.org/jira/browse/ARROW-5502
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> Arrow is supposed to let you work with datasets bigger than memory. Memory 
> mapping is a big part of that. It should be the default way that files are 
> read in the `read_*` functions. To disable memory mapping, we could use a 
> global `option()`, or a function argument, but that might clutter the 
> interface. Or we could not give a choice and only fall back to not memory 
> mapping if the platform/file system doesn't support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5553) red-arrow gem does not compile on ruby:2.5 docker image

2019-06-11 Thread Sean Dilda (JIRA)
Sean Dilda created ARROW-5553:
-

 Summary: red-arrow gem does not compile on ruby:2.5 docker image
 Key: ARROW-5553
 URL: https://issues.apache.org/jira/browse/ARROW-5553
 Project: Apache Arrow
  Issue Type: Bug
  Components: Ruby
Affects Versions: 0.13.0
Reporter: Sean Dilda


I'm attempting to install the red-arrow gem in a docker container based on the 
ruby:2.5 image on docker hub.   I followed the debian instructions on 
[https://github.com/red-data-tools/packages.red-data-tools.org] to install 
libarrow-dev.

 

I then ran 'gem install red-arrow'

The output is as follows:

Building native extensions. This could take a while...
ERROR:  Error installing red-arrow:
ERROR: Failed to build gem native extension.

current directory: /usr/local/bundle/gems/red-arrow-0.13.0/ext/arrow
/usr/local/bin/ruby -I /usr/local/lib/ruby/site_ruby/2.5.0 -r 
./siteconf20190611-1782-1gg8bpn.rb extconf.rb
checking --enable-debug-build option... no
checking C++ compiler... g++
checking g++ version... 6.3 (gnu++14)
checking for --enable-debug-build option... no
checking for -Wall option to compiler... yes
checking for -Waggregate-return option to compiler... yes
checking for -Wcast-align option to compiler... yes
checking for -Wextra option to compiler... yes
checking for -Wformat=2 option to compiler... yes
checking for -Winit-self option to compiler... yes
checking for -Wlarger-than-65500 option to compiler... yes
checking for -Wmissing-declarations option to compiler... yes
checking for -Wmissing-format-attribute option to compiler... yes
checking for -Wmissing-include-dirs option to compiler... yes
checking for -Wmissing-noreturn option to compiler... yes
checking for -Wmissing-prototypes option to compiler... yes
checking for -Wnested-externs option to compiler... yes
checking for -Wold-style-definition option to compiler... yes
checking for -Wpacked option to compiler... yes
checking for -Wp,-D_FORTIFY_SOURCE=2 option to compiler... yes
checking for -Wpointer-arith option to compiler... yes
checking for -Wswitch-default option to compiler... yes
checking for -Wswitch-enum option to compiler... yes
checking for -Wundef option to compiler... yes
checking for -Wout-of-line-declaration option to compiler... no
checking for -Wunsafe-loop-optimizations option to compiler... yes
checking for -Wwrite-strings option to compiler... yes
checking for Homebrew... no
checking for arrow... yes
checking for arrow-glib... yes
creating Makefile

current directory: /usr/local/bundle/gems/red-arrow-0.13.0/ext/arrow
make "DESTDIR=" clean

current directory: /usr/local/bundle/gems/red-arrow-0.13.0/ext/arrow
make "DESTDIR="
compiling arrow.cpp
compiling record-batch.cpp
record-batch.cpp: In member function 'VALUE 
red_arrow::{anonymous}::StructArrayValueConverter::convert(const 
arrow::StructArray&, int64_t)':
record-batch.cpp:344:40: error: 'const class arrow::StructArray' has no member 
named 'struct_type'
 const auto struct_type = array.struct_type();
^~~
In file included from /usr/local/include/ruby-2.5.0/ruby/ruby.h:29:0,
 from /usr/local/include/ruby-2.5.0/ruby.h:33,
 from 
/usr/local/bundle/gems/glib2-3.3.6/ext/glib2/rbgobject.h:26,
 from red-arrow.hpp:33,
 from record-batch.cpp:20:
/usr/local/include/ruby-2.5.0/ruby/defines.h:105:57: error: void value not 
ignored as it ought to be
 #define RB_GNUC_EXTENSION_BLOCK(x) __extension__ ({ x; })
 ^
/usr/local/include/ruby-2.5.0/ruby/intern.h:788:35: note: in expansion of macro 
'RB_GNUC_EXTENSION_BLOCK'
 #define rb_utf8_str_new(str, len) RB_GNUC_EXTENSION_BLOCK( \
   ^~~
record-batch.cpp:350:18: note: in expansion of macro 'rb_utf8_str_new'
   key_ = rb_utf8_str_new(field_name.data(), field_name.length());
  ^~~
record-batch.cpp: In member function 'uint8_t 
red_arrow::{anonymous}::UnionArrayValueConverter::compute_child_index(const 
arrow::UnionArray&, arrow::UnionType*, const char*)':
record-batch.cpp:516:66: error: no matching function for call to 
'arrow::Status::Invalid(const char [18], const unsigned char&)'
 check_status(Status::Invalid("Unknown type ID: ", type_id),
  ^
In file included from /usr/include/arrow/buffer.h:30:0,
 from /usr/include/arrow/array.h:28,
 from /usr/include/arrow/api.h:23,
 from red-arrow.hpp:22,
 from record-batch.cpp:20:
/usr/include/arrow/status.h:150:17: note: candidate: static arrow::Status 
arrow::Status::Invalid(const string&)
   static Status Invalid(const std::string& msg) {
 ^~~
/usr/include/arrow/status.h:

[jira] [Commented] (ARROW-5502) [R] file readers should mmap

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861119#comment-16861119
 ] 

Wes McKinney commented on ARROW-5502:
-

The Parquet C++ library by default only reads the serialized column data from 
disk that needs to be deserialized. Using memory-mapping indeed avoids memory 
allocation.

Note that for high latency file sources (like Amazon S3) -- where memory 
mapping is not possible -- many data warehousing systems have found it more 
efficient to read an entire Parquet row group into memory at a time and discard 
the unused columns. We will likely be forced as a matter of performance 
optimization to add some reader options to parquet-cpp around this issue

> [R] file readers should mmap
> 
>
> Key: ARROW-5502
> URL: https://issues.apache.org/jira/browse/ARROW-5502
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> Arrow is supposed to let you work with datasets bigger than memory. Memory 
> mapping is a big part of that. It should be the default way that files are 
> read in the `read_*` functions. To disable memory mapping, we could use a 
> global `option()`, or a function argument, but that might clutter the 
> interface. Or we could not give a choice and only fall back to not memory 
> mapping if the platform/file system doesn't support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5554) Add a python wrapper for arrow::Concatenate

2019-06-11 Thread Zhuo Peng (JIRA)
Zhuo Peng created ARROW-5554:


 Summary: Add a python wrapper for arrow::Concatenate
 Key: ARROW-5554
 URL: https://issues.apache.org/jira/browse/ARROW-5554
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.14.0
Reporter: Zhuo Peng






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5554) Add a python wrapper for arrow::Concatenate

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5554:
--
Labels: pull-request-available  (was: )

> Add a python wrapper for arrow::Concatenate
> ---
>
> Key: ARROW-5554
> URL: https://issues.apache.org/jira/browse/ARROW-5554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.14.0
>Reporter: Zhuo Peng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5554) Add a python wrapper for arrow::Concatenate

2019-06-11 Thread Zhuo Peng (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861143#comment-16861143
 ] 

Zhuo Peng commented on ARROW-5554:
--

[https://github.com/apache/arrow/pull/4519]

> Add a python wrapper for arrow::Concatenate
> ---
>
> Key: ARROW-5554
> URL: https://issues.apache.org/jira/browse/ARROW-5554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.14.0
>Reporter: Zhuo Peng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5412) [Java] Integration test fails with UnsupportedOperationException

2019-06-11 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861159#comment-16861159
 ] 

Benjamin Kietzman edited comment on ARROW-5412 at 6/11/19 3:52 PM:
---

Tried again with Oracle JDK 8, got the same error:
{code}
$ apt search jdk | grep installed
jdk1.8/now 1.8.0211-1 amd64 [installed,local]
libslf4j-java/bionic,bionic,now 1.7.25-3 all [installed,auto-removable]
{code}


was (Author: bkietz):
Tried again with Oracle JDK 8, got the same error:
{code}
{code}

> [Java] Integration test fails with UnsupportedOperationException
> 
>
> Key: ARROW-5412
> URL: https://issues.apache.org/jira/browse/ARROW-5412
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Benjamin Kietzman
>Priority: Minor
>
> Running the java integration test fails with an exception:
> {code}
> $ java -classpath 
> ~/arrow/java/tools/target/arrow-tools-0.14.0-SNAPSHOT-jar-with-dependencies.jar
>  -Dio.netty.tryReflectionSetAccessible=false 
> org.apache.arrow.tools.Integration -a ~/tmp/1832930b_simple.json_as_file -j 
> ~/arrow/integration/data/simple.json -c VALIDATE
> ...
> Incompatible files
> sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available
> 08:55:43.597 [main] ERROR org.apache.arrow.tools.Integration - Incompatible 
> files
> java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
> java.nio.DirectByteBuffer.(long, int) not available
>   at 
> io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
>   at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:233)
>   at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:223)
>   at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
>   at 
> org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:211)
>   at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:175)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:119)
>   at 
> org.apache.arrow.vector.ipc.ArrowFileWriter.writeRecordBatch(ArrowFileWriter.java:61)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:107)
>   at 
> org.apache.arrow.tools.Integration$Command$2.execute(Integration.java:171)
>   at org.apache.arrow.tools.Integration.run(Integration.java:118)
>   at org.apache.arrow.tools.Integration.main(Integration.java:69)
> {code}
> Looking through netty's source, it looks like this exception is [emitted 
> here|https://github.com/netty/netty/blob/master/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L343-L344].
> {code}
> $ apt search jdk | grep installed
> default-jre/bionic-updates,bionic-security,now 2:1.11-68ubuntu1~18.04.1 amd64 
> [installed,automatic]
> default-jre-headless/bionic-updates,bionic-security,now 
> 2:1.11-68ubuntu1~18.04.1 amd64 [installed,automatic]
> libslf4j-java/bionic,bionic,now 1.7.25-3 all [installed,automatic]
> openjdk-11-jdk/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed]
> openjdk-11-jdk-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> openjdk-11-jre/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed,automatic]
> openjdk-11-jre-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5412) [Java] Integration test fails with UnsupportedOperationException

2019-06-11 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861159#comment-16861159
 ] 

Benjamin Kietzman commented on ARROW-5412:
--

Tried again with Oracle JDK 8, got the same error:
{code}
{code}

> [Java] Integration test fails with UnsupportedOperationException
> 
>
> Key: ARROW-5412
> URL: https://issues.apache.org/jira/browse/ARROW-5412
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Benjamin Kietzman
>Priority: Minor
>
> Running the java integration test fails with an exception:
> {code}
> $ java -classpath 
> ~/arrow/java/tools/target/arrow-tools-0.14.0-SNAPSHOT-jar-with-dependencies.jar
>  -Dio.netty.tryReflectionSetAccessible=false 
> org.apache.arrow.tools.Integration -a ~/tmp/1832930b_simple.json_as_file -j 
> ~/arrow/integration/data/simple.json -c VALIDATE
> ...
> Incompatible files
> sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available
> 08:55:43.597 [main] ERROR org.apache.arrow.tools.Integration - Incompatible 
> files
> java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
> java.nio.DirectByteBuffer.(long, int) not available
>   at 
> io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
>   at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:233)
>   at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:223)
>   at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
>   at 
> org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:211)
>   at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:175)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:119)
>   at 
> org.apache.arrow.vector.ipc.ArrowFileWriter.writeRecordBatch(ArrowFileWriter.java:61)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:107)
>   at 
> org.apache.arrow.tools.Integration$Command$2.execute(Integration.java:171)
>   at org.apache.arrow.tools.Integration.run(Integration.java:118)
>   at org.apache.arrow.tools.Integration.main(Integration.java:69)
> {code}
> Looking through netty's source, it looks like this exception is [emitted 
> here|https://github.com/netty/netty/blob/master/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L343-L344].
> {code}
> $ apt search jdk | grep installed
> default-jre/bionic-updates,bionic-security,now 2:1.11-68ubuntu1~18.04.1 amd64 
> [installed,automatic]
> default-jre-headless/bionic-updates,bionic-security,now 
> 2:1.11-68ubuntu1~18.04.1 amd64 [installed,automatic]
> libslf4j-java/bionic,bionic,now 1.7.25-3 all [installed,automatic]
> openjdk-11-jdk/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed]
> openjdk-11-jdk-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> openjdk-11-jre/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed,automatic]
> openjdk-11-jre-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5412) [Java] Integration test fails with UnsupportedOperationException

2019-06-11 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861192#comment-16861192
 ] 

Bryan Cutler commented on ARROW-5412:
-

I also saw the same error with JDK 8 and 9. I got it passing for Java by adding 
{{-Dio.netty.tryReflectionSetAccessible=true}} to the cmd. Let me put up a PR 
for the fix.

> [Java] Integration test fails with UnsupportedOperationException
> 
>
> Key: ARROW-5412
> URL: https://issues.apache.org/jira/browse/ARROW-5412
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Benjamin Kietzman
>Priority: Minor
>
> Running the java integration test fails with an exception:
> {code}
> $ java -classpath 
> ~/arrow/java/tools/target/arrow-tools-0.14.0-SNAPSHOT-jar-with-dependencies.jar
>  -Dio.netty.tryReflectionSetAccessible=false 
> org.apache.arrow.tools.Integration -a ~/tmp/1832930b_simple.json_as_file -j 
> ~/arrow/integration/data/simple.json -c VALIDATE
> ...
> Incompatible files
> sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available
> 08:55:43.597 [main] ERROR org.apache.arrow.tools.Integration - Incompatible 
> files
> java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
> java.nio.DirectByteBuffer.(long, int) not available
>   at 
> io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
>   at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:233)
>   at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:223)
>   at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
>   at 
> org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:211)
>   at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:175)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:119)
>   at 
> org.apache.arrow.vector.ipc.ArrowFileWriter.writeRecordBatch(ArrowFileWriter.java:61)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:107)
>   at 
> org.apache.arrow.tools.Integration$Command$2.execute(Integration.java:171)
>   at org.apache.arrow.tools.Integration.run(Integration.java:118)
>   at org.apache.arrow.tools.Integration.main(Integration.java:69)
> {code}
> Looking through netty's source, it looks like this exception is [emitted 
> here|https://github.com/netty/netty/blob/master/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L343-L344].
> {code}
> $ apt search jdk | grep installed
> default-jre/bionic-updates,bionic-security,now 2:1.11-68ubuntu1~18.04.1 amd64 
> [installed,automatic]
> default-jre-headless/bionic-updates,bionic-security,now 
> 2:1.11-68ubuntu1~18.04.1 amd64 [installed,automatic]
> libslf4j-java/bionic,bionic,now 1.7.25-3 all [installed,automatic]
> openjdk-11-jdk/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed]
> openjdk-11-jdk-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> openjdk-11-jre/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed,automatic]
> openjdk-11-jre-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861201#comment-16861201
 ] 

Antoine Pitrou commented on ARROW-5548:
---

I'm skeptical we can modify arrow.apache.org from within Travis-CI jobs...

By the way, ideally the docs are built from a CUDA-enabled machine (at least a 
machine with the CUDA toolkit installed?) so as to display CUDA API docs as 
well.

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5551) [Go] invalid FixedSizeArray representation

2019-06-11 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5551.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4517
[https://github.com/apache/arrow/pull/4517]

> [Go] invalid FixedSizeArray representation
> --
>
> Key: ARROW-5551
> URL: https://issues.apache.org/jira/browse/ARROW-5551
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FixedSizeArrays are currently represented as 3-buffers data.
> but the C++ definition expects a 2-buffers data layout (as all the primitive 
> arrays.)
> (uncovered while trying to roundtrip all "integration" tests.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861210#comment-16861210
 ] 

Neal Richardson commented on ARROW-5548:


arrow.apache.org is [https://github.com/apache/arrow-site] hosted with github 
pages? Or is there some manual update step? If github pages, we can build on 
Travis and push to it, only need a github personal access token from a user 
with commit privileges on the repo (which we would store encrypted in the 
.travis.yml). 

Or we can do it on the server where buildbot is running, which has CUDA?

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5509) [R] write_parquet()

2019-06-11 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5509:
--

Assignee: Uwe L. Korn  (was: Romain François)

> [R] write_parquet()
> ---
>
> Key: ARROW-5509
> URL: https://issues.apache.org/jira/browse/ARROW-5509
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We can read but not yet write. The C++ library supports this and pyarrow does 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1207) [C++] Implement Map logical type

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-1207.
---
Resolution: Fixed

Issue resolved by pull request 4352
[https://github.com/apache/arrow/pull/4352]

> [C++] Implement Map logical type
> 
>
> Key: ARROW-1207
> URL: https://issues.apache.org/jira/browse/ARROW-1207
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> A map is implemented as a list of structs with fields key and value. We 
> should separately discuss whether this merits an addition to the Arrow 
> metadata 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4433) [R] Segmentation fault when instantiating arrow::table from data frame

2019-06-11 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-4433.
--
Resolution: Cannot Reproduce

Closing; can reopen if we get feedback about reproducing.

> [R] Segmentation fault when instantiating arrow::table from data frame
> --
>
> Key: ARROW-4433
> URL: https://issues.apache.org/jira/browse/ARROW-4433
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
> Environment: R version 3.5.2 (2018-12-20)
> Platform: x86_64-suse-linux-gnu (64-bit)
>Reporter: Lutz
>Priority: Critical
> Fix For: 0.14.0
>
>
> The sample code from [https://github.com/apache/arrow/tree/master/r] leads to 
> a segmentation fault
>  
> {quote}library(arrow, warn.conflicts = FALSE)
>  library(tibble)
>  library(reticulate)
>  tf <- tempfile() 
> (tib <- tibble(x = 1:10, y = rnorm(10)))
> arrow::write_arrow(tib, tf)
>  *** caught segfault *** 
> address (nil), cause 'memory not mapped' 
>  
> Traceback: 
>  1: Table__from_dataframe(.data) 
>  2: shared_ptr_is_null(xp) 
>  3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 
>  4: table(x) 
>  5: to_arrow.data.frame(x) 
>  6: to_arrow(x) 
>  7: write_arrow.fs_path(x, fs::path_abs(stream), ...) 
>  8: write_arrow(x, fs::path_abs(stream), ...) 
>  9: write_arrow.character(tib, tf) 
> 10: arrow::write_arrow(tib, tf)
>  {quote}
>  
> The same problem appears also when just calling arrow::table(tib):
> {quote}> arrow::table(tib) 
>  
>  *** caught segfault *** 
> address (nil), cause 'memory not mapped' 
>  
> Traceback: 
>  1: Table__from_dataframe(.data) 
>  2: shared_ptr_is_null(xp) 
>  3: shared_ptr(`arrow::Table`, Table__from_dataframe(.data)) 
>  4: arrow::table(tib)
>  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-11 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5488:
---
Priority: Blocker  (was: Major)

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-11 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5488:
--

Assignee: Neal Richardson

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-11 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5488:
--

Assignee: Romain François  (was: Neal Richardson)

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-11 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5488:
---
Fix Version/s: 0.14.0

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5555) [R] install_arrow()

2019-06-11 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-:
--

 Summary: [R] install_arrow()
 Key: ARROW-
 URL: https://issues.apache.org/jira/browse/ARROW-
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 0.14.0
Reporter: Neal Richardson
Assignee: Neal Richardson


Following ARROW-5488, it will be possible to install the R package without 
having libarrow installed, but you won't be able to do anything until you do. 
The error message you get when trying to use the package directs you to call 
{{install_arrow()}}. 

This function will at a minimum give a recommendation of steps to take to 
install the library. In some cases, we may be able to download and install it 
for the user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5412) [Java] Integration test fails with UnsupportedOperationException

2019-06-11 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861245#comment-16861245
 ] 

Bryan Cutler commented on ARROW-5412:
-

Actually, I got it working with JDK 8 without the above conf. I think 
previously, I had compiled with JDK 8 and tested with JDK 9 which led to the 
netty error. I'll try again with just JDK 9.

> [Java] Integration test fails with UnsupportedOperationException
> 
>
> Key: ARROW-5412
> URL: https://issues.apache.org/jira/browse/ARROW-5412
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Benjamin Kietzman
>Priority: Minor
>
> Running the java integration test fails with an exception:
> {code}
> $ java -classpath 
> ~/arrow/java/tools/target/arrow-tools-0.14.0-SNAPSHOT-jar-with-dependencies.jar
>  -Dio.netty.tryReflectionSetAccessible=false 
> org.apache.arrow.tools.Integration -a ~/tmp/1832930b_simple.json_as_file -j 
> ~/arrow/integration/data/simple.json -c VALIDATE
> ...
> Incompatible files
> sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available
> 08:55:43.597 [main] ERROR org.apache.arrow.tools.Integration - Incompatible 
> files
> java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
> java.nio.DirectByteBuffer.(long, int) not available
>   at 
> io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
>   at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:233)
>   at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:223)
>   at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
>   at 
> org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:211)
>   at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:175)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:119)
>   at 
> org.apache.arrow.vector.ipc.ArrowFileWriter.writeRecordBatch(ArrowFileWriter.java:61)
>   at 
> org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:107)
>   at 
> org.apache.arrow.tools.Integration$Command$2.execute(Integration.java:171)
>   at org.apache.arrow.tools.Integration.run(Integration.java:118)
>   at org.apache.arrow.tools.Integration.main(Integration.java:69)
> {code}
> Looking through netty's source, it looks like this exception is [emitted 
> here|https://github.com/netty/netty/blob/master/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L343-L344].
> {code}
> $ apt search jdk | grep installed
> default-jre/bionic-updates,bionic-security,now 2:1.11-68ubuntu1~18.04.1 amd64 
> [installed,automatic]
> default-jre-headless/bionic-updates,bionic-security,now 
> 2:1.11-68ubuntu1~18.04.1 amd64 [installed,automatic]
> libslf4j-java/bionic,bionic,now 1.7.25-3 all [installed,automatic]
> openjdk-11-jdk/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed]
> openjdk-11-jdk-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> openjdk-11-jre/bionic-updates,bionic-security,now 11.0.3+7-1ubuntu2~18.04.1 
> amd64 [installed,automatic]
> openjdk-11-jre-headless/bionic-updates,bionic-security,now 
> 11.0.3+7-1ubuntu2~18.04.1 amd64 [installed,automatic]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5527) [C++] HashTable/MemoTable should use Buffer(s)/Builder(s) for heap data

2019-06-11 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-5527:
-

Assignee: Francois Saint-Jacques

> [C++] HashTable/MemoTable should use Buffer(s)/Builder(s) for heap data
> ---
>
> Key: ARROW-5527
> URL: https://issues.apache.org/jira/browse/ARROW-5527
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>
> The current implementation uses `std::vector` and `std::string` with 
> unbounded size. The refactor would take a memory pool in the constructor for 
> buffer management and would get rid of vectors.
> This will have the side effect of propagating Status to some calls (notably 
> insert due to Upsize failing to resize).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3897) [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3897.
-
Resolution: Fixed

Issue resolved by pull request 4328
[https://github.com/apache/arrow/pull/4328]

> [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file
> ---
>
> Key: ARROW-3897
> URL: https://issues.apache.org/jira/browse/ARROW-3897
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: MATLAB
>Reporter: Rylan Dmello
>Assignee: Kevin Gurney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 48h
>  Time Spent: 9h 10m
>  Remaining Estimate: 38h 50m
>
> Currently the MATLAB - Feather interface supports reading numeric datatypes 
> (double, single, uint* and int*) from a Feather file. We should also add 
> support for writing these numeric datatypes to a Feather file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861271#comment-16861271
 ] 

Antoine Pitrou commented on ARROW-5548:
---

I don't think it's github pages. AFAIU, all *.apache.org sites are hosted on 
Apache Infrastructure.

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4139:
---

Assignee: Wes McKinney

> [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is 
> set
> ---
>
> Key: ARROW-4139
> URL: https://issues.apache.org/jira/browse/ARROW-4139
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Matthew Rocklin
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: parquet, pull-request-available, python
> Fix For: 0.14.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When writing Pandas data to Parquet format and reading it back again I find 
> that that statistics of text columns are stored as byte arrays rather than as 
> unicode text. 
> I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding 
> of how best to manage statistics.  (I'd be quite happy to learn that it was 
> the latter).
> Here is a minimal example
> {code:python}
> import pandas as pd
> df = pd.DataFrame({'x': ['a']})
> df.to_parquet('df.parquet')
> import pyarrow.parquet as pq
> pf = pq.ParquetDataset('df.parquet')
> piece = pf.pieces[0]
> rg = piece.row_group(0)
> md = piece.get_metadata(pq.ParquetFile)
> rg = md.row_group(0)
> c = rg.column(0)
> >>> c
> 
>   file_offset: 63
>   file_path: 
>   physical_type: BYTE_ARRAY
>   num_values: 1
>   path_in_schema: x
>   is_stats_set: True
>   statistics:
> 
>   has_min_max: True
>   min: b'a'
>   max: b'a'
>   null_count: 0
>   distinct_count: 0
>   num_values: 1
>   physical_type: BYTE_ARRAY
>   compression: SNAPPY
>   encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
>   has_dictionary_page: True
>   dictionary_page_offset: 4
>   data_page_offset: 25
>   total_compressed_size: 59
>   total_uncompressed_size: 55
> >>> type(c.statistics.min)
> bytes
> {code}
> My guess is that we would want to store a logical type in the statistics like 
> UNICODE, though I don't have enough experience with Parquet data types to 
> know if this is a good idea or possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861272#comment-16861272
 ] 

Wes McKinney commented on ARROW-4139:
-

I can pick up this one for 0.14.0, but someone else is free to take it from me

> [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is 
> set
> ---
>
> Key: ARROW-4139
> URL: https://issues.apache.org/jira/browse/ARROW-4139
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Matthew Rocklin
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: parquet, pull-request-available, python
> Fix For: 0.14.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When writing Pandas data to Parquet format and reading it back again I find 
> that that statistics of text columns are stored as byte arrays rather than as 
> unicode text. 
> I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding 
> of how best to manage statistics.  (I'd be quite happy to learn that it was 
> the latter).
> Here is a minimal example
> {code:python}
> import pandas as pd
> df = pd.DataFrame({'x': ['a']})
> df.to_parquet('df.parquet')
> import pyarrow.parquet as pq
> pf = pq.ParquetDataset('df.parquet')
> piece = pf.pieces[0]
> rg = piece.row_group(0)
> md = piece.get_metadata(pq.ParquetFile)
> rg = md.row_group(0)
> c = rg.column(0)
> >>> c
> 
>   file_offset: 63
>   file_path: 
>   physical_type: BYTE_ARRAY
>   num_values: 1
>   path_in_schema: x
>   is_stats_set: True
>   statistics:
> 
>   has_min_max: True
>   min: b'a'
>   max: b'a'
>   null_count: 0
>   distinct_count: 0
>   num_values: 1
>   physical_type: BYTE_ARRAY
>   compression: SNAPPY
>   encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
>   has_dictionary_page: True
>   dictionary_page_offset: 4
>   data_page_offset: 25
>   total_compressed_size: 59
>   total_uncompressed_size: 55
> >>> type(c.statistics.min)
> bytes
> {code}
> My guess is that we would want to store a logical type in the statistics like 
> UNICODE, though I don't have enough experience with Parquet data types to 
> know if this is a good idea or possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861273#comment-16861273
 ] 

Antoine Pitrou commented on ARROW-5548:
---

IOW, it almost certainly needs a manual update step currently...

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3650) [Python] Mixed column indexes are read back as strings

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3650:
---

Assignee: Joris Van den Bossche

> [Python] Mixed column indexes are read back as strings 
> ---
>
> Key: ARROW-3650
> URL: https://issues.apache.org/jira/browse/ARROW-3650
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Armin Berres
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Consider the following example: 
> {code:java}
> df = pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['a 
> string', pd.to_datetime('2018/01/02')])
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'test.parquet')
> ref_df = pq.read_pandas('test.parquet').to_pandas()
> print(df.columns)
> # Index(['a string', 2018-01-02 00:00:00], dtype='object')
> print(ref_df.columns)
> # Index(['a string', '2018-01-02 00:00:00'], dtype='object')
> {code}
> The serialized data frame has an index with a string and a datetime field 
> (happened when resetting the index of a formerly datetime only column).
> When reading the string back the datetime is converted into a string.
> When looking at the schema I find {{"pandas_type": "mixed", "numpy_ty'
> b'pe": "object"}} before serializing and {{"pandas_type": 
> "unicode", "numpy_'
> b'type": "object"}} after reading back. So the schema was aware 
> of the mixed type but did not store the actual types.
> The same happens with other types like numbers as well. One can produce 
> interesting situations:
> {{pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['1', 1])}} 
> can be written but fails to be read back as the index is no more unique with 
> '1' showing up two times.
> IIf this is not a bug but expected maybe the user should be somehow warned 
> that information is lost? Like a {{NotImplemented}} exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3650) [Python] Mixed column indexes are read back as strings

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3650.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4244
[https://github.com/apache/arrow/pull/4244]

> [Python] Mixed column indexes are read back as strings 
> ---
>
> Key: ARROW-3650
> URL: https://issues.apache.org/jira/browse/ARROW-3650
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Armin Berres
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Consider the following example: 
> {code:java}
> df = pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['a 
> string', pd.to_datetime('2018/01/02')])
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'test.parquet')
> ref_df = pq.read_pandas('test.parquet').to_pandas()
> print(df.columns)
> # Index(['a string', 2018-01-02 00:00:00], dtype='object')
> print(ref_df.columns)
> # Index(['a string', '2018-01-02 00:00:00'], dtype='object')
> {code}
> The serialized data frame has an index with a string and a datetime field 
> (happened when resetting the index of a formerly datetime only column).
> When reading the string back the datetime is converted into a string.
> When looking at the schema I find {{"pandas_type": "mixed", "numpy_ty'
> b'pe": "object"}} before serializing and {{"pandas_type": 
> "unicode", "numpy_'
> b'type": "object"}} after reading back. So the schema was aware 
> of the mixed type but did not store the actual types.
> The same happens with other types like numbers as well. One can produce 
> interesting situations:
> {{pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['1', 1])}} 
> can be written but fails to be read back as the index is no more unique with 
> '1' showing up two times.
> IIf this is not a bug but expected maybe the user should be somehow warned 
> that information is lost? Like a {{NotImplemented}} exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5554) Add a python wrapper for arrow::Concatenate

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5554.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4519
[https://github.com/apache/arrow/pull/4519]

> Add a python wrapper for arrow::Concatenate
> ---
>
> Key: ARROW-5554
> URL: https://issues.apache.org/jira/browse/ARROW-5554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.14.0
>Reporter: Zhuo Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5554) Add a python wrapper for arrow::Concatenate

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5554:
-

Assignee: Zhuo Peng

> Add a python wrapper for arrow::Concatenate
> ---
>
> Key: ARROW-5554
> URL: https://issues.apache.org/jira/browse/ARROW-5554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.14.0
>Reporter: Zhuo Peng
>Assignee: Zhuo Peng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5312) [C++] Move JSON integration testing utilities to arrow/testing and libarrow_testing.so

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861308#comment-16861308
 ] 

Antoine Pitrou commented on ARROW-5312:
---

Isn't this at odds with ARROW-2857 ("Expose integration test JSON read/write in 
Python API")?

> [C++] Move JSON integration testing utilities to arrow/testing and 
> libarrow_testing.so
> --
>
> Key: ARROW-5312
> URL: https://issues.apache.org/jira/browse/ARROW-5312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> It's not necessary to have this code in libarrow.so. Let's tackle after 
> ARROW-3144 and ARROW-835



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861313#comment-16861313
 ] 

Neal Richardson commented on ARROW-5548:


In that case, we should {{rm -rf docs/latest/}} since it will always be a lie. 
IMO we should still look into CI/CD for the docs site, just perhaps not on 
apache.org.

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5556) [Doc] Document JSON reader

2019-06-11 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5556:
-

 Summary: [Doc] Document JSON reader
 Key: ARROW-5556
 URL: https://issues.apache.org/jira/browse/ARROW-5556
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Python
Reporter: Antoine Pitrou


The JSON reader API should be documented at least on the Python side.
See {{docs/source/python/csv.rst}} and {{docs/source/python/api/formats.rst}} 
for inspiration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5556) [Doc] Document JSON reader

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861319#comment-16861319
 ] 

Antoine Pitrou commented on ARROW-5556:
---

[~bkietz]

> [Doc] Document JSON reader
> --
>
> Key: ARROW-5556
> URL: https://issues.apache.org/jira/browse/ARROW-5556
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> The JSON reader API should be documented at least on the Python side.
> See {{docs/source/python/csv.rst}} and {{docs/source/python/api/formats.rst}} 
> for inspiration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5557) [C++] Investigate performance of VisitBitsUnrolled on different platforms

2019-06-11 Thread Rylan Dmello (JIRA)
Rylan Dmello created ARROW-5557:
---

 Summary: [C++] Investigate performance of VisitBitsUnrolled on 
different platforms
 Key: ARROW-5557
 URL: https://issues.apache.org/jira/browse/ARROW-5557
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Rylan Dmello


Investigate performance of `VisitBitsUnrolled` utility on different platforms, 
based on [this 
thread|https://github.com/apache/arrow/pull/4328#discussion_r292515822]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3927) [C++] JSON integration test segfaults on Alpine

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861335#comment-16861335
 ] 

Antoine Pitrou commented on ARROW-3927:
---

The build fails here:
{code}
FAILED: debug/libarrow_testing.so.14.0.0 
: && /usr/bin/ccache /usr/bin/g++ -fPIC -Wno-noexcept-type  
-fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror -msse4.2  -g   -shared 
-Wl,-soname,libarrow_testing.so.14 -o debug/libarrow_testing.so.14.0.0 
src/arrow/CMakeFiles/arrow_testing_objlib.dir/io/test-common.cc.o 
src/arrow/CMakeFiles/arrow_testing_objlib.dir/ipc/test-common.cc.o 
src/arrow/CMakeFiles/arrow_testing_objlib.dir/filesystem/test-util.cc.o 
src/arrow/CMakeFiles/arrow_testing_objlib.dir/testing/gtest_util.cc.o 
src/arrow/CMakeFiles/arrow_testing_objlib.dir/testing/random.cc.o  
-Wl,-rpath,/build/cpp/debug:/build/cpp/googletest_ep-prefix/src/googletest_ep/lib:
 debug/libarrow.so.14.0.0 
googletest_ep-prefix/src/googletest_ep/lib/libgtestd.so 
double-conversion_ep/src/double-conversion_ep/lib/libdouble-conversion.a 
/usr/lib/libcrypto.so brotli_ep/src/brotli_ep-install/lib/libbrotlienc-static.a 
brotli_ep/src/brotli_ep-install/lib/libbrotlidec-static.a 
brotli_ep/src/brotli_ep-install/lib/libbrotlicommon-static.a 
glog_ep-prefix/src/glog_ep/lib/libglog.a -ldl 
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -lrt && :
g++: error: googletest_ep-prefix/src/googletest_ep/lib/libgtestd.so: No such 
file or directory
{code}

> [C++] JSON integration test segfaults on Alpine
> ---
>
> Key: ARROW-3927
> URL: https://issues.apache.org/jira/browse/ARROW-3927
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: alpine
>
> {code}
> $ docker-compose build cpp-alpine
> $ docker-compose run cpp-alpine bash
> bash-4.4# ulimit -c unlimited
> bash-4.4# /arrow/ci/docker_build_cpp.sh
> bash-4.4# /build/cpp/debug/json-integration-test
> [==] Running 2 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 2 tests from TestJSONIntegration
> [ RUN  ] TestJSONIntegration.ConvertAndValidate
> unknown file: Failure
> C++ exception with description "std::bad_alloc" thrown in the test body.
> [  FAILED  ] TestJSONIntegration.ConvertAndValidate (19 ms)
> [ RUN  ] TestJSONIntegration.ErrorStates
> Segmentation fault (core dumped)
> {code}
> Backtrace:
> {code}
> bash-4.4# gdb /build/cpp/debug/json-integration-test -c core
> GNU gdb (GDB) 8.0.1
> Copyright (C) 2017 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-alpine-linux-musl".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /build/cpp/debug/json-integration-test...done.
> warning: core file may not match specified executable file.
> [New LWP 225]
> warning: Can't read pathname for load map: No error information.
> Core was generated by `/build/cpp/debug/json-integration-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x7f4458ad8c9e in std::string::_Rep::_M_dispose(std::allocator 
> const&) () from /usr/lib/libstdc++.so.6
> (gdb) bt
> #0  0x7f4458ad8c9e in std::string::_Rep::_M_dispose(std::allocator 
> const&) () from /usr/lib/libstdc++.so.6
> #1  0x7f4458ad8cd3 in std::basic_string, 
> std::allocator >::~basic_string() () from /usr/lib/libstdc++.so.6
> #2  0x55eb9abdc402 in boost::filesystem::path::~path 
> (this=0x7ffda38537f8, __in_chrg=) at 
> /usr/include/boost/filesystem/path.hpp:56
> #3  0x55eb9abd97c7 in arrow::ipc::temp_path () at 
> /arrow/cpp/src/arrow/ipc/json-integration-test.cc:241
> #4  0x55eb9abdca1a in arrow::ipc::TestJSONIntegration::mkstemp 
> (this=0x7f4458f7c3c0) at /arrow/cpp/src/arrow/ipc/json-integration-test.cc:249
> #5  0x55eb9abda347 in 
> arrow::ipc::TestJSONIntegration_ErrorStates_Test::TestBody 
> (this=0x7f4458f7c3c0) at /arrow/cpp/src/arrow/ipc/json-integration-test.cc:391
> #6  0x7f4459c26877 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported void> (object=0x7f4458f7c3c0, method=&virtual testing::Test::TestBody(), 
> location=0x7f4459c36c4b "the test body")
> at 
> /

[jira] [Updated] (ARROW-5465) [Crossbow] Support writing submitted job definition yaml to a file

2019-06-11 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-5465:
---
Summary: [Crossbow] Support writing submitted job definition yaml to a file 
 (was: [Crossbow] Support writing job definition to a file on submit )

> [Crossbow] Support writing submitted job definition yaml to a file
> --
>
> Key: ARROW-5465
> URL: https://issues.apache.org/jira/browse/ARROW-5465
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In similar fashion like archery benchmark does. Required to consume the 
> command's output from a buildbot build step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5556) [Doc] Document JSON reader

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5556:
-

Assignee: Antoine Pitrou

> [Doc] Document JSON reader
> --
>
> Key: ARROW-5556
> URL: https://issues.apache.org/jira/browse/ARROW-5556
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> The JSON reader API should be documented at least on the Python side.
> See {{docs/source/python/csv.rst}} and {{docs/source/python/api/formats.rst}} 
> for inspiration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5556) [Doc] Document JSON reader

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5556:
--
Labels: pull-request-available  (was: )

> [Doc] Document JSON reader
> --
>
> Key: ARROW-5556
> URL: https://issues.apache.org/jira/browse/ARROW-5556
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The JSON reader API should be documented at least on the Python side.
> See {{docs/source/python/csv.rst}} and {{docs/source/python/api/formats.rst}} 
> for inspiration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861390#comment-16861390
 ] 

Wes McKinney commented on ARROW-5548:
-

I don't think it's important (or even necessarily a good idea) to publish docs 
for non-released versions of the project on arrow.apache.org. From the 
perspective of Apache Arrow, the software "does not exist" until it is 
released. It'd be easier and more manageable for a third party to host 
developer docs. Perhaps we (Ursa Labs affiliates) can create a "Developer 
Resources" page linked off ursalabs.org containing links to nightly package 
builds, nightly docs, and other resources that are non-PMC-sanctioned

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861398#comment-16861398
 ] 

Antoine Pitrou commented on ARROW-5270:
---

We now test with ASAN and UBSAN on Travis, so perhaps this is not as important 
as it used to be. [~wesmckinn][~emkornfi...@gmail.com] opinions?

> [C++] Reenable Valgrind on Travis-CI
> 
>
> Key: ARROW-5270
> URL: https://issues.apache.org/jira/browse/ARROW-5270
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running Valgrind on Travis-CI was disabled in ARROW-4611 (apparently because 
> of issues within the re2 library).
> We should reenable it at some point in order to exercise the reliability of 
> our C++ code.
> (and/or have a build with another piece of instrumentation enabled such as 
> ASAN)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861398#comment-16861398
 ] 

Antoine Pitrou edited comment on ARROW-5270 at 6/11/19 7:35 PM:


We now test with ASAN and UBSAN on Travis, so perhaps this is not as important 
as it used to be. [~wesmckinn] [~emkornfi...@gmail.com] opinions?


was (Author: pitrou):
We now test with ASAN and UBSAN on Travis, so perhaps this is not as important 
as it used to be. [~wesmckinn][~emkornfi...@gmail.com] opinions?

> [C++] Reenable Valgrind on Travis-CI
> 
>
> Key: ARROW-5270
> URL: https://issues.apache.org/jira/browse/ARROW-5270
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running Valgrind on Travis-CI was disabled in ARROW-4611 (apparently because 
> of issues within the re2 library).
> We should reenable it at some point in order to exercise the reliability of 
> our C++ code.
> (and/or have a build with another piece of instrumentation enabled such as 
> ASAN)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5558) [C++] Support Array::View on arrays with non-zero offsets

2019-06-11 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5558:
---

 Summary: [C++] Support Array::View on arrays with non-zero offsets
 Key: ARROW-5558
 URL: https://issues.apache.org/jira/browse/ARROW-5558
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


Follow up work to initial implementation of {{Array::View}} in ARROW-1774



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5555) [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-:

Summary: [R] Add install_arrow() function to assist the user in obtaining 
C++ runtime libraries  (was: [R] install_arrow())

> [R] Add install_arrow() function to assist the user in obtaining C++ runtime 
> libraries
> --
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.14.0
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>
> Following ARROW-5488, it will be possible to install the R package without 
> having libarrow installed, but you won't be able to do anything until you do. 
> The error message you get when trying to use the package directs you to call 
> {{install_arrow()}}. 
> This function will at a minimum give a recommendation of steps to take to 
> install the library. In some cases, we may be able to download and install it 
> for the user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861403#comment-16861403
 ] 

Antoine Pitrou commented on ARROW-5548:
---

Assuming that these can be linked to from the official website, that sounds 
reasonable to me.

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1774) [C++] Add "view" function to create zero-copy views for compatible types, if supported

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1774.
-
Resolution: Fixed

Issue resolved by pull request 4482
[https://github.com/apache/arrow/pull/4482]

> [C++] Add "view" function to create zero-copy views for compatible types, if 
> supported
> --
>
> Key: ARROW-1774
> URL: https://issues.apache.org/jira/browse/ARROW-1774
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Similar to NumPy's {{ndarray.view}}, but with the restriction that the input 
> and output types have the same physical Arrow memory layout. This might be as 
> simple as adding a "zero copy only" option to the existing {{Cast}} kernel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3290) [C++] Toolchain support for secure gRPC

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861405#comment-16861405
 ] 

Antoine Pitrou commented on ARROW-3290:
---

I'm not surprised that TLS would hurt performance. Even hardware-accelerated 
AES has a significant cost, that varies quite a bit depending on the CPU model.

> [C++] Toolchain support for secure gRPC 
> 
>
> Key: ARROW-3290
> URL: https://issues.apache.org/jira/browse/ARROW-3290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
> Fix For: 0.14.0
>
>
> In ARROW-3146 I added support for the narrow use case of CMake-installed gRPC 
> and linking with the unsecure libraries. There are a number of additional 
> dependencies to be able to connect to secure services



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5312) [C++] Move JSON integration testing utilities to arrow/testing and libarrow_testing.so

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861408#comment-16861408
 ] 

Wes McKinney commented on ARROW-5312:
-

It is at odds. Do you think that is even useful? We could add optional (that we 
don't ship in production wheels/conda packages) Python bindings for functions 
in libarrow_testing if we actually need any of this stuff

> [C++] Move JSON integration testing utilities to arrow/testing and 
> libarrow_testing.so
> --
>
> Key: ARROW-5312
> URL: https://issues.apache.org/jira/browse/ARROW-5312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> It's not necessary to have this code in libarrow.so. Let's tackle after 
> ARROW-3144 and ARROW-835



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5538) [C++] Restrict minimum OpenSSL version to 1.0.2

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861412#comment-16861412
 ] 

Antoine Pitrou commented on ARROW-5538:
---

They're not really prohibited, just discouraged. Our main concern should be API 
compatibility, IMO.

> [C++] Restrict minimum OpenSSL version to 1.0.2
> ---
>
> Key: ARROW-5538
> URL: https://issues.apache.org/jira/browse/ARROW-5538
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Deepak Majeti
>Assignee: Deepak Majeti
>Priority: Major
>
> We must enable encryption support in Arrow only if the OpenSSL version is at 
> least 1.0.2. The official documentation prohibits using older versions.
> [https://www.openssl.org/source/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861415#comment-16861415
 ] 

Wes McKinney commented on ARROW-5270:
-

I'd be okay relegating valgrind testing to docker-compose

> [C++] Reenable Valgrind on Travis-CI
> 
>
> Key: ARROW-5270
> URL: https://issues.apache.org/jira/browse/ARROW-5270
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running Valgrind on Travis-CI was disabled in ARROW-4611 (apparently because 
> of issues within the re2 library).
> We should reenable it at some point in order to exercise the reliability of 
> our C++ code.
> (and/or have a build with another piece of instrumentation enabled such as 
> ASAN)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3290) [C++] Toolchain support for secure gRPC

2019-06-11 Thread David Li (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861417#comment-16861417
 ] 

David Li commented on ARROW-3290:
-

The gRPC (and Dropbox!) people like to claim it has no impact :) 

[https://blogs.dropbox.com/tech/2019/01/courier-dropbox-migration-to-grpc/] 
search for "Encryption is not expensive"

> [C++] Toolchain support for secure gRPC 
> 
>
> Key: ARROW-3290
> URL: https://issues.apache.org/jira/browse/ARROW-3290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
> Fix For: 0.14.0
>
>
> In ARROW-3146 I added support for the narrow use case of CMake-installed gRPC 
> and linking with the unsecure libraries. There are a number of additional 
> dependencies to be able to connect to secure services



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5515) [Java] Ensure JVM has sufficient capacity for large number of local reference

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5515:
--
Summary: [Java] Ensure JVM has sufficient capacity for large number of 
local reference  (was: Ensure JVM has sufficient capacity for large number of 
local reference)

> [Java] Ensure JVM has sufficient capacity for large number of local reference
> -
>
> Key: ARROW-5515
> URL: https://issues.apache.org/jira/browse/ARROW-5515
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Yurui Zhou
>Assignee: Yurui Zhou
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5515) Ensure JVM has sufficient capacity for large number of local reference

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5515:
--
Component/s: (was: C++)

> Ensure JVM has sufficient capacity for large number of local reference
> --
>
> Key: ARROW-5515
> URL: https://issues.apache.org/jira/browse/ARROW-5515
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Yurui Zhou
>Assignee: Yurui Zhou
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5519) [Java] Add ORC JNI related components to travis CI

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5519:
--
Summary: [Java] Add ORC JNI related components to travis CI  (was: Add ORC 
JNI related components to travis CI)

> [Java] Add ORC JNI related components to travis CI
> --
>
> Key: ARROW-5519
> URL: https://issues.apache.org/jira/browse/ARROW-5519
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java
>Reporter: Yurui Zhou
>Assignee: Yurui Zhou
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5514) [C++] Printer for uint64 shows wrong values

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861423#comment-16861423
 ] 

Antoine Pitrou commented on ARROW-5514:
---

[~jorisvandenbossche] Do you want to take this up? Otherwise someone else can.

> [C++] Printer for uint64 shows wrong values
> ---
>
> Key: ARROW-5514
> URL: https://issues.apache.org/jira/browse/ARROW-5514
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Joris Van den Bossche
>Priority: Minor
>
> From the example in ARROW-5430:
> {code}
> In [16]: pa.array([14989096668145380166, 15869664087396458664], 
> type=pa.uint64()) 
>   
> Out[16]: 
> 
> [
>   -3457647405564171450,
>   -2577079986313092952
> ]
> {code}
> I _think_ the actual conversion is correct, and it's only the printer that is 
> going wrong, as {{to_numpy}} gives the correct values:
> {code}
> In [17]: pa.array([14989096668145380166, 15869664087396458664], 
> type=pa.uint64()).to_numpy()  
>   
> Out[17]: array([14989096668145380166, 15869664087396458664], dtype=uint64)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5548) [Documentation] http://arrow.apache.org/docs/latest/ is not latest

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861422#comment-16861422
 ] 

Wes McKinney commented on ARROW-5548:
-

Possibly, but must be careful to abide by ASF guidelines, "The general ruling 
on nightly builds is that it's up to the projects themselves, and that they 
shouldn't point the general public to them."

It would probably be okay to have a section on the website like "Third 
Party-maintained Developer Resources". This is something we could discuss 
further on the mailing list. 

> [Documentation] http://arrow.apache.org/docs/latest/ is not latest
> --
>
> Key: ARROW-5548
> URL: https://issues.apache.org/jira/browse/ARROW-5548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Website
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> In testing out the Dockerfile for building the docs, I noticed it created an 
> asf-site/docs/latest directory at the end. Out of curiosity, I went to 
> [http://arrow.apache.org/docs/latest/], and it reports a version of 
> {{0.11.1.dev473+g6ed02454}}, which is not close to "latest".
> I'd like to see this "latest" site get updated automatically. I'm working on 
> getting this Docker setup complete (cf. 
> https://issues.apache.org/jira/browse/ARROW-5497), and once that's working, 
> it should be feasible to add a Travis-CI job to update /docs/latest on every 
> commit to master to apache/arrow. 
> cc [~wesmckinn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5458) [C++] ARMv8 parallel CRC32c computation optimization

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5458:
--
Summary: [C++] ARMv8 parallel CRC32c computation optimization  (was: Apache 
Arrow parallel CRC32c computation optimization)

> [C++] ARMv8 parallel CRC32c computation optimization
> 
>
> Key: ARROW-5458
> URL: https://issues.apache.org/jira/browse/ARROW-5458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Yuqi Gu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> ARMv8 defines VMULL/PMULL crypto instruction.
> This patch optimizes crc32c calculate with the instruction when
> available rather than original linear crc instructions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5475) [Python] Add Python binding for arrow::Concatenate

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-5475.
-
   Resolution: Duplicate
Fix Version/s: (was: 0.15.0)
   0.14.0

> [Python] Add Python binding for arrow::Concatenate
> --
>
> Key: ARROW-5475
> URL: https://issues.apache.org/jira/browse/ARROW-5475
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5428) [C++] Add option to set "read extent" in arrow::io::BufferedInputStream

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861424#comment-16861424
 ] 

Antoine Pitrou commented on ARROW-5428:
---

Does something still need to be done here?

> [C++] Add option to set "read extent" in arrow::io::BufferedInputStream
> ---
>
> Key: ARROW-5428
> URL: https://issues.apache.org/jira/browse/ARROW-5428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I'm looking at simplifying libparquet to use common IO interfaces rather than 
> its own custom ones
> The {{parquet::BufferedInputStream}} interface has an option to not read 
> beyond a particular number of bytes. For example, if we were reading a 32MB 
> block with 1MB buffering, then we will not consume more than 32MB from the 
> raw InputStream. 
> This seems like a fairly trivial addition to 
> {{arrow::io::BufferedInputStream}} to track total read bytes and do not read 
> beyond the configured extent. We'll have to add a method like 
> {{set_read_extent}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5426) [C++] Repair Windows static CRT build configuration after ARROW-5403

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5426:
--
Priority: Minor  (was: Major)

> [C++] Repair Windows static CRT build configuration after ARROW-5403
> 
>
> Key: ARROW-5426
> URL: https://issues.apache.org/jira/browse/ARROW-5426
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
>
> In ARROW-5403 we changed to using the googletest DLL for unit testing because 
> of the issue reported there. However, the static CRT build only works when 
> all libraries are statically-linked into unit test executables, including 
> gtest.lib and arrow.lib. Using gtest.dll with static CRT does not seem to 
> work because gtest.dll has its own statically-linked CRT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5428) [C++] Add option to set "read extent" in arrow::io::BufferedInputStream

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5428.
-
Resolution: Fixed

Nope. I did it

> [C++] Add option to set "read extent" in arrow::io::BufferedInputStream
> ---
>
> Key: ARROW-5428
> URL: https://issues.apache.org/jira/browse/ARROW-5428
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I'm looking at simplifying libparquet to use common IO interfaces rather than 
> its own custom ones
> The {{parquet::BufferedInputStream}} interface has an option to not read 
> beyond a particular number of bytes. For example, if we were reading a 32MB 
> block with 1MB buffering, then we will not consume more than 32MB from the 
> raw InputStream. 
> This seems like a fairly trivial addition to 
> {{arrow::io::BufferedInputStream}} to track total read bytes and do not read 
> beyond the configured extent. We'll have to add a method like 
> {{set_read_extent}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4784) [C++][CI] Re-enable flaky mingw tests.

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861427#comment-16861427
 ] 

Antoine Pitrou commented on ARROW-4784:
---

[~kou] [~emkornfi...@gmail.com] is this still an issue?

> [C++][CI] Re-enable flaky mingw tests.
> --
>
> Key: ARROW-4784
> URL: https://issues.apache.org/jira/browse/ARROW-4784
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Micah Kornfield
>Priority: Major
>  Labels: ci-failure
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-840) [Python] Provide Python API for creating user-defined data types that can survive Arrow IPC

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861428#comment-16861428
 ] 

Wes McKinney commented on ARROW-840:


[~jorisvandenbossche] [~pitrou] what do you think is the path ahead for this 
project? There are two distinct areas of work:

* Defining extension types in Python and providing Python exposure for already 
C++-defined types
* Bridging between extension types (which may be C++ or Python-defined) and 
pandas

This issue ARROW-840 covers the former but not the latter. It would be nice to 
have this feature available in 0.14.0. I'm concerned from today's date (June 
11) that we may miss the window for 0.14.0 though

> [Python] Provide Python API for creating user-defined data types that can 
> survive Arrow IPC
> ---
>
> Key: ARROW-840
> URL: https://issues.apache.org/jira/browse/ARROW-840
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> The user will provide:
> * Data type subclass that can indicate the physical storage type
> * "get state" and "set state" functions for serializing custom metadata to 
> bytes
> * An optional function for "boxing" scalar values from the physical array 
> storage
> Internally, this will build on an analogous C++ API for defining user data 
> types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-951) [JS] Fix generated API documentation

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861429#comment-16861429
 ] 

Wes McKinney commented on ARROW-951:


[~bhulette] any luck on this one? I'm removing from any release milestone for 
the time being

> [JS] Fix generated API documentation
> 
>
> Key: ARROW-951
> URL: https://issues.apache.org/jira/browse/ARROW-951
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Minor
>  Labels: documentation
> Fix For: 0.14.0
>
>
> The current generated API documentation doesn't respect the project's 
> namespaces, it simply lists all exported objects. We should see if we can 
> make typedoc display the project's structure (even if it means re-structuring 
> the code a bit), or find another approach for doc generation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-951) [JS] Fix generated API documentation

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-951:
---
Fix Version/s: (was: 0.14.0)

> [JS] Fix generated API documentation
> 
>
> Key: ARROW-951
> URL: https://issues.apache.org/jira/browse/ARROW-951
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Minor
>  Labels: documentation
>
> The current generated API documentation doesn't respect the project's 
> namespaces, it simply lists all exported objects. We should see if we can 
> make typedoc display the project's structure (even if it means re-structuring 
> the code a bit), or find another approach for doc generation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1574) [C++] Implement kernel function that converts a dense array to dictionary given known dictionary

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1574:

Fix Version/s: (was: 0.14.0)

> [C++] Implement kernel function that converts a dense array to dictionary 
> given known dictionary
> 
>
> Key: ARROW-1574
> URL: https://issues.apache.org/jira/browse/ARROW-1574
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
>
> This may simply be a special case of cast using a dictionary type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1175) [Java] Implement/test dictionary-encoded subfields

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1175:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Java] Implement/test dictionary-encoded subfields
> --
>
> Key: ARROW-1175
> URL: https://issues.apache.org/jira/browse/ARROW-1175
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> We do not have any tests about types like:
> {code}
> List
> {code}
> cc [~julienledem] [~elahrvivaz]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1564:

Fix Version/s: (was: 0.14.0)

> [C++] Kernel functions for computing minimum and maximum of an array in one 
> pass
> 
>
> Key: ARROW-1564
> URL: https://issues.apache.org/jira/browse/ARROW-1564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
>
> This is useful for determining whether a small-range integer O( n ) sort can 
> be used in some circumstances. Can also be used for simply computing array 
> statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1562) [C++] Numeric kernel implementations for add (+)

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1562:

Fix Version/s: (was: 0.14.0)

> [C++] Numeric kernel implementations for add (+)
> 
>
> Key: ARROW-1562
> URL: https://issues.apache.org/jira/browse/ARROW-1562
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
>
> This function should respect consistent type promotions between types of 
> different sizes and signed and unsigned integers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1175) [Java] Implement/test dictionary-encoded subfields

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861430#comment-16861430
 ] 

Wes McKinney commented on ARROW-1175:
-

[~emkornfi...@gmail.com] I might put this on your radar for some point in the 
future, moving out of 0.14.0 for now

> [Java] Implement/test dictionary-encoded subfields
> --
>
> Key: ARROW-1175
> URL: https://issues.apache.org/jira/browse/ARROW-1175
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> We do not have any tests about types like:
> {code}
> List
> {code}
> cc [~julienledem] [~elahrvivaz]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1621) [JAVA] Reduce Heap Usage per Vector

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1621:

Fix Version/s: (was: 0.14.0)

> [JAVA] Reduce Heap Usage per Vector
> ---
>
> Key: ARROW-1621
> URL: https://issues.apache.org/jira/browse/ARROW-1621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>Priority: Major
>
> https://docs.google.com/document/d/1MU-ah_bBHIxXNrd7SkwewGCOOexkXJ7cgKaCis5f-PI/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Reporter: Philipp Moritz
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 1.0.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3802) [C++] Cast from integer to half float not implemented

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3802:
--
Issue Type: Improvement  (was: Bug)

> [C++] Cast from integer to half float not implemented
> -
>
> Key: ARROW-3802
> URL: https://issues.apache.org/jira/browse/ARROW-3802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Romain François
>Priority: Major
> Fix For: 0.15.0
>
>
> {code:java}
> library(reticulate)
> pa <- import("pyarrow")
> pa$array(c(1,2,3))$cast(pa$float16())
> #> Error in py_call_impl(callable, dots$args, dots$keywords): 
> ArrowNotImplementedError: No cast implemented from double to halffloat
> #> 
> #> Detailed traceback: 
> #> File "pyarrow/array.pxi", line 277, in pyarrow.lib.Array.cast 
> (/Users/travis/build/BryanCutler/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:30459)
> #> File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status 
> (/Users/travis/build/BryanCutler/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:8570)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5312) [C++] Move JSON integration testing utilities to arrow/testing and libarrow_testing.so

2019-06-11 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861434#comment-16861434
 ] 

Antoine Pitrou commented on ARROW-5312:
---

I don't know. But if we want to provide a Python binding we shouldn't 
complicate our lives by having to link against libarrow_testing, IMHO.

> [C++] Move JSON integration testing utilities to arrow/testing and 
> libarrow_testing.so
> --
>
> Key: ARROW-5312
> URL: https://issues.apache.org/jira/browse/ARROW-5312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> It's not necessary to have this code in libarrow.so. Let's tackle after 
> ARROW-3144 and ARROW-835



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1797) [C++] Implement binary arithmetic kernels for numeric arrays

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1797:

Fix Version/s: (was: 0.14.0)

> [C++] Implement binary arithmetic kernels for numeric arrays
> 
>
> Key: ARROW-1797
> URL: https://issues.apache.org/jira/browse/ARROW-1797
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1875) Write 64-bit ints as strings in integration test JSON files

2019-06-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861436#comment-16861436
 ] 

Wes McKinney commented on ARROW-1875:
-

[~emkornfi...@gmail.com] [~paul.e.taylor] any chance the three of us could 
collude to make this change? I can help with the C++ work if there is Java and 
JS work to happen at the same time

> Write 64-bit ints as strings in integration test JSON files
> ---
>
> Key: ARROW-1875
> URL: https://issues.apache.org/jira/browse/ARROW-1875
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Integration, JavaScript
>Reporter: Brian Hulette
>Priority: Minor
> Fix For: 0.14.0
>
>
> Javascript can't handle 64-bit integers natively, so writing them as strings 
> in the JSON would make implementing the integration tests a lot simpler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1956) [Python] Support reading specific partitions from a partitioned parquet dataset

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1956:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Python] Support reading specific partitions from a partitioned parquet 
> dataset
> ---
>
> Key: ARROW-1956
> URL: https://issues.apache.org/jira/browse/ARROW-1956
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Kernel: 4.14.8-300.fc27.x86_64
> Python: 3.6.3
>Reporter: Suvayu Ali
>Priority: Minor
>  Labels: parquet
> Fix For: 1.0.0
>
> Attachments: so-example.py
>
>
> I want to read specific partitions from a partitioned parquet dataset.  This 
> is very useful in case of large datasets.  I have attached a small script 
> that creates a dataset and shows what is expected when reading (quoting 
> salient points below).
> # There is no way to read specific partitions in Pandas
> # In pyarrow I tried to achieve the goal by providing a list of 
> files/directories to ParquetDataset, but it didn't work: 
> # In PySpark it works if I simply do:
> {code:none}
> spark.read.options('basePath', 'datadir').parquet(*list_of_partitions)
> {code}
> I also couldn't find a way to easily write partitioned parquet files.  In the 
> end I did it by hand by creating the directory hierarchies, and writing the 
> individual files myself (similar to the implementation in the attached 
> script).  Again, in PySpark I can do 
> {code:none}
> df.write.partitionBy(*list_of_partitions).parquet(output)
> {code}
> to achieve that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5270:
--
Priority: Minor  (was: Critical)

> [C++] Reenable Valgrind on Travis-CI
> 
>
> Key: ARROW-5270
> URL: https://issues.apache.org/jira/browse/ARROW-5270
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running Valgrind on Travis-CI was disabled in ARROW-4611 (apparently because 
> of issues within the re2 library).
> We should reenable it at some point in order to exercise the reliability of 
> our C++ code.
> (and/or have a build with another piece of instrumentation enabled such as 
> ASAN)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2006) [C++] Add option to trim excess padding when writing IPC messages

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2006:

Fix Version/s: (was: 0.14.0)

> [C++] Add option to trim excess padding when writing IPC messages
> -
>
> Key: ARROW-2006
> URL: https://issues.apache.org/jira/browse/ARROW-2006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> This will help with situations like 
> [https://github.com/apache/arrow/issues/1467] where we don't really need the 
> extra padding bytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5270:
--
Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Reenable Valgrind on Travis-CI
> 
>
> Key: ARROW-5270
> URL: https://issues.apache.org/jira/browse/ARROW-5270
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running Valgrind on Travis-CI was disabled in ARROW-4611 (apparently because 
> of issues within the re2 library).
> We should reenable it at some point in order to exercise the reliability of 
> our C++ code.
> (and/or have a build with another piece of instrumentation enabled such as 
> ASAN)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-06-11 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5270:
--
Issue Type: Improvement  (was: Bug)

> [C++] Reenable Valgrind on Travis-CI
> 
>
> Key: ARROW-5270
> URL: https://issues.apache.org/jira/browse/ARROW-5270
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running Valgrind on Travis-CI was disabled in ARROW-4611 (apparently because 
> of issues within the re2 library).
> We should reenable it at some point in order to exercise the reliability of 
> our C++ code.
> (and/or have a build with another piece of instrumentation enabled such as 
> ASAN)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >