[jira] [Resolved] (ARROW-6598) [Java] Sort the code for ApproxEqualsVisitor

2019-10-24 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-6598.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5418
[https://github.com/apache/arrow/pull/5418]

> [Java] Sort the code for ApproxEqualsVisitor
> 
>
> Key: ARROW-6598
> URL: https://issues.apache.org/jira/browse/ARROW-6598
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As a follow up issue of ARROW-6458, we finalize the code for 
> ApproxEqualsVisitor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6989) [Python][C++] Assert is triggered when decimal type inference occurs on a value with out of range precision

2019-10-24 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-6989:
---
Component/s: C++

> [Python][C++] Assert is triggered when decimal type inference occurs on a 
> value with out of range precision
> ---
>
> Key: ARROW-6989
> URL: https://issues.apache.org/jira/browse/ARROW-6989
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Major
>
> Example:
> pa.array([decimal.Decimal(123.234)] )
>  
> The problem is that inference.cc calls the direct constructor for decimal 
> types instead using Make.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6990) [C++] Support casting between decimal types with compatible precision/scales

2019-10-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6990:
--

 Summary: [C++] Support casting between decimal types with 
compatible precision/scales
 Key: ARROW-6990
 URL: https://issues.apache.org/jira/browse/ARROW-6990
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


This seems like a reasonable thing to support and showed up as a question on 
the user mailing list (through some sort of python code).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6989) [Python][C++] Assert is triggered when decimal type inference occurs on a value with out of range precision

2019-10-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6989:
--

 Summary: [Python][C++] Assert is triggered when decimal type 
inference occurs on a value with out of range precision
 Key: ARROW-6989
 URL: https://issues.apache.org/jira/browse/ARROW-6989
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield


Example:
pa.array([decimal.Decimal(123.234)] )
 
The problem is that inference.cc calls the direct constructor for decimal types 
instead using Make.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6988) [CI][R] Buildbot's R Conda is failing

2019-10-24 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959392#comment-16959392
 ] 

Neal Richardson commented on ARROW-6988:


Everywhere else the R builds are passing, including the Conda R job on 
Krisztián's docker-compose/GitHub-Actions branch, so I'm not yet convinced this 
is a real problem.

> [CI][R] Buildbot's R Conda is failing
> -
>
> Key: ARROW-6988
> URL: https://issues.apache.org/jira/browse/ARROW-6988
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> {code:java}
>   Running ‘testthat.R’
>  ERROR
> Running the tests in ‘tests/testthat.R’ failed.
> Last 13 lines of output:
>   25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled 
> && !is.null(test)) {skip_empty()}}, expectation = 
> handle_expectation, skip = handle_skip, warning = handle_warning, message 
> = handle_message, error = handle_error), error = handle_fatal, skip = 
> function(e) {})
>   26: test_code(NULL, exprs, env)
>   27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)
>   28: force(code)
>   29: with_reporter(reporter = reporter, start_end_reporter = 
> start_end_reporter, {reporter$start_file(basename(path))
> lister$start_file(basename(path))source_file(path, new.env(parent = 
> env), chdir = TRUE, wrap = wrap)reporter$.end_context()   
>  reporter$end_file()})
>   30: FUN(X[[i]], ...)
>   31: lapply(paths, test_file, env = env, reporter = current_reporter, 
> start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)
>   32: force(code)
>   33: with_reporter(reporter = current_reporter, results <- lapply(paths, 
> test_file, env = env, reporter = current_reporter, start_end_reporter = 
> FALSE, load_helpers = FALSE, wrap = wrap))
>   34: test_files(paths, reporter = reporter, env = env, stop_on_failure = 
> stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap)
>   35: test_dir(path = test_path, reporter = reporter, env = env, filter = 
> filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = 
> stop_on_warning, wrap = wrap)
>   36: test_package_dir(package = package, test_path = test_path, filter = 
> filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, 
> stop_on_warning = stop_on_warning, wrap = wrap)
>   37: test_check("arrow")
>   An irrecoverable exception occurred. R is aborting now ...
>   Segmentation fault (core dumped)
> * checking for unstated dependencies in vignettes ... OK
> * checking package vignettes in ‘inst/doc’ ... OK
> * checking re-building of vignette outputs ... OK
> * DONE
> Status: 1 ERROR, 1 WARNING, 2 NOTEs
> See
>   ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’
> for details.
>  {code}
> [|https://ci.ursalabs.org/#/builders/95] 
> [https://ci.ursalabs.org/#/builders/95/builds/2386] 
> [https://ci.ursalabs.org/#/builders/95]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-6969.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5725
[https://github.com/apache/arrow/pull/5725]

> [C++][Dataset] ParquetScanTask eagerly load file 
> -
>
> Key: ARROW-6969
> URL: https://issues.apache.org/jira/browse/ARROW-6969
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Dataset
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The file content should only be read when invoking ParquetScanTask::Scan, not 
> on construction. This blocks reading in a true streaming fashion with memory 
> constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6969:
--
Component/s: C++

> [C++][Dataset] ParquetScanTask eagerly load file 
> -
>
> Key: ARROW-6969
> URL: https://issues.apache.org/jira/browse/ARROW-6969
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Dataset
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The file content should only be read when invoking ParquetScanTask::Scan, not 
> on construction. This blocks reading in a true streaming fashion with memory 
> constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6964) [C++][Dataset] Expose a nested parallel option for Scanner::ToTable

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-6964.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5721
[https://github.com/apache/arrow/pull/5721]

> [C++][Dataset] Expose a nested parallel option for Scanner::ToTable
> ---
>
> Key: ARROW-6964
> URL: https://issues.apache.org/jira/browse/ARROW-6964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Dataset
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6964) [C++][Dataset] Expose a nested parallel option for Scanner::ToTable

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6964:
--
Component/s: C++

> [C++][Dataset] Expose a nested parallel option for Scanner::ToTable
> ---
>
> Key: ARROW-6964
> URL: https://issues.apache.org/jira/browse/ARROW-6964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Dataset
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6966) [Go] 32bit memset is null

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6966:
--
Component/s: Go

> [Go] 32bit memset is null
> -
>
> Key: ARROW-6966
> URL: https://issues.apache.org/jira/browse/ARROW-6966
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Jonathan A Sternberg
>Assignee: Jonathan A Sternberg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If you use a function that calls `memset.Set`, the implementation on a 32 bit 
> machine seems to be unset. This happened in our 32 bit build here:
> [https://circleci.com/gh/influxdata/influxdb/66112#tests/containers/2]
> {code:java}
> goroutine 66 [running]:goroutine 66 
> [running]:testing.tRunner.func1(0x9e1f2c0) 
> /usr/local/go/src/testing/testing.go:830 +0x30epanic(0x899cb40, 0x9403c40) 
> /usr/local/go/src/runtime/panic.go:522 
> +0x16egithub.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/memory.Set(...)
>  
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/memory/memory.go:25github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*builder).init(0x9e44990,
>  0x20) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/builder.go:101
>  
> +0xc7github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).init(0x9e44990,
>  0x20) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:102
>  
> +0x2fgithub.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).Resize(0x9e44990,
>  0x2) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:125
>  
> +0x42github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*builder).reserve(0x9e44990,
>  0x1, 0x9c52464) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/builder.go:138
>  
> +0x72github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).Reserve(0x9e44990,
>  0x1) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:113
>  
> +0x51github.com/influxdata/influxdb/vendor/github.com/influxdata/flux/arrow.NewInt(0x9e4a770,
>  0x1, 0x1, 0x0, 0x89f0360) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/flux/arrow/int.go:10
>  
> +0x6cgithub.com/influxdata/influxdb/storage/reads.(*floatTable).advance(0x9e42070,
>  0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/table.gen.go:91 
> +0x7egithub.com/influxdata/influxdb/storage/reads.newFloatTable(0x9e17740, 
> 0xe521a160, 0x9e1b8c0, 0x0, 0x0, 0x1e, 0x0, 0x8c13be0, 0x9e448a0, 0x9e448d0, 
> ...) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/table.gen.go:47 
> +0x1c2github.com/influxdata/influxdb/storage/reads.(*filterIterator).handleRead(0x9e22840,
>  0x9e0d1a0, 0x8c0ce00, 0x9e48780, 0x0, 0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/reader.go:177 
> +0x755github.com/influxdata/influxdb/storage/reads.(*filterIterator).Do(0x9e22840,
>  0x9e0d170, 0x9c40070, 0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/reader.go:140 
> +0x138github.com/influxdata/influxdb/storage/reads_test.TestDuplicateKeys_ReadFilter(0x9e1f2c0)
>  /root/go/src/github.com/influxdata/influxdb/storage/reads/reader_test.go:89 
> +0x1dftesting.tRunner(0x9e1f2c0, 0x8ad44e4) 
> /usr/local/go/src/testing/testing.go:865 +0x97created by testing.(*T).Run 
> /usr/local/go/src/testing/testing.go:916 +0x2b2
> {code}
> I added a print statement at where memset happened to print the function that 
> was being used and got this:
> {code}
>  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] 0
> {code}
> If I set {{memset}} with a default, the code that calls into this works fine.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6956) [C++] Status should use unique_ptr

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6956:
--
Component/s: C++

> [C++] Status should use unique_ptr
> --
>
> Key: ARROW-6956
> URL: https://issues.apache.org/jira/browse/ARROW-6956
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Priority: Minor
>
> The logic of Status::State is _very_  similar to unique_ptr except the deep 
> copy on copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6988) [CI][R] Buildbot's R Conda is failing

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6988:
--
Component/s: R
 Continuous Integration

> [CI][R] Buildbot's R Conda is failing
> -
>
> Key: ARROW-6988
> URL: https://issues.apache.org/jira/browse/ARROW-6988
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> {code:java}
>   Running ‘testthat.R’
>  ERROR
> Running the tests in ‘tests/testthat.R’ failed.
> Last 13 lines of output:
>   25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled 
> && !is.null(test)) {skip_empty()}}, expectation = 
> handle_expectation, skip = handle_skip, warning = handle_warning, message 
> = handle_message, error = handle_error), error = handle_fatal, skip = 
> function(e) {})
>   26: test_code(NULL, exprs, env)
>   27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)
>   28: force(code)
>   29: with_reporter(reporter = reporter, start_end_reporter = 
> start_end_reporter, {reporter$start_file(basename(path))
> lister$start_file(basename(path))source_file(path, new.env(parent = 
> env), chdir = TRUE, wrap = wrap)reporter$.end_context()   
>  reporter$end_file()})
>   30: FUN(X[[i]], ...)
>   31: lapply(paths, test_file, env = env, reporter = current_reporter, 
> start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)
>   32: force(code)
>   33: with_reporter(reporter = current_reporter, results <- lapply(paths, 
> test_file, env = env, reporter = current_reporter, start_end_reporter = 
> FALSE, load_helpers = FALSE, wrap = wrap))
>   34: test_files(paths, reporter = reporter, env = env, stop_on_failure = 
> stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap)
>   35: test_dir(path = test_path, reporter = reporter, env = env, filter = 
> filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = 
> stop_on_warning, wrap = wrap)
>   36: test_package_dir(package = package, test_path = test_path, filter = 
> filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, 
> stop_on_warning = stop_on_warning, wrap = wrap)
>   37: test_check("arrow")
>   An irrecoverable exception occurred. R is aborting now ...
>   Segmentation fault (core dumped)
> * checking for unstated dependencies in vignettes ... OK
> * checking package vignettes in ‘inst/doc’ ... OK
> * checking re-building of vignette outputs ... OK
> * DONE
> Status: 1 ERROR, 1 WARNING, 2 NOTEs
> See
>   ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’
> for details.
>  {code}
> [|https://ci.ursalabs.org/#/builders/95] 
> [https://ci.ursalabs.org/#/builders/95/builds/2386] 
> [https://ci.ursalabs.org/#/builders/95]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6988) [CI][R] Buildbot's R Conda is failing

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6988:
-

 Summary: [CI][R] Buildbot's R Conda is failing
 Key: ARROW-6988
 URL: https://issues.apache.org/jira/browse/ARROW-6988
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


{code:java}
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled && 
!is.null(test)) {skip_empty()}}, expectation = handle_expectation, 
skip = handle_skip, warning = handle_warning, message = handle_message, 
error = handle_error), error = handle_fatal, skip = function(e) {})
  26: test_code(NULL, exprs, env)
  27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)
  28: force(code)
  29: with_reporter(reporter = reporter, start_end_reporter = 
start_end_reporter, {reporter$start_file(basename(path))
lister$start_file(basename(path))source_file(path, new.env(parent = 
env), chdir = TRUE, wrap = wrap)reporter$.end_context() 
   reporter$end_file()})
  30: FUN(X[[i]], ...)
  31: lapply(paths, test_file, env = env, reporter = current_reporter, 
start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)
  32: force(code)
  33: with_reporter(reporter = current_reporter, results <- lapply(paths, 
test_file, env = env, reporter = current_reporter, start_end_reporter = FALSE,  
   load_helpers = FALSE, wrap = wrap))
  34: test_files(paths, reporter = reporter, env = env, stop_on_failure = 
stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap)
  35: test_dir(path = test_path, reporter = reporter, env = env, filter = 
filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = 
stop_on_warning, wrap = wrap)
  36: test_package_dir(package = package, test_path = test_path, filter = 
filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, 
stop_on_warning = stop_on_warning, wrap = wrap)
  37: test_check("arrow")
  An irrecoverable exception occurred. R is aborting now ...
  Segmentation fault (core dumped)
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ... OK
* DONE
Status: 1 ERROR, 1 WARNING, 2 NOTEs
See
  ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’
for details.
 {code}
[|https://ci.ursalabs.org/#/builders/95] 
[https://ci.ursalabs.org/#/builders/95/builds/2386] 
[https://ci.ursalabs.org/#/builders/95]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6987) [CI] Travis OSX failing to install sdk headers

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6987:
--
Labels: pull-request-available  (was: )

> [CI] Travis OSX failing to install sdk headers
> --
>
> Key: ARROW-6987
> URL: https://issues.apache.org/jira/browse/ARROW-6987
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> sudo installer -pkg 
> /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
>  -target /343installer: Package name is 
> macOS_SDK_headers_for_macOS_10.14344installer: Certificate used to sign 
> package is not trusted. Use -allowUntrusted to override.345The command 
> "$TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh --only-library --homebrew" 
> failed and exited with 1 during .
> {code}
> See [https://travis-ci.org/apache/arrow/jobs/602434884#L342-L345]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6987) [CI] Travis OSX failing to install sdk headers

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6987:
-

 Summary: [CI] Travis OSX failing to install sdk headers
 Key: ARROW-6987
 URL: https://issues.apache.org/jira/browse/ARROW-6987
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


{code:java}
sudo installer -pkg 
/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
 -target /343installer: Package name is 
macOS_SDK_headers_for_macOS_10.14344installer: Certificate used to sign package 
is not trusted. Use -allowUntrusted to override.345The command 
"$TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh --only-library --homebrew" 
failed and exited with 1 during .
{code}
See [https://travis-ci.org/apache/arrow/jobs/602434884#L342-L345]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6962) [C++] [CI] Stop compiling with -Weverything

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6962:
---
Fix Version/s: 0.15.1

> [C++] [CI] Stop compiling with -Weverything
> ---
>
> Key: ARROW-6962
> URL: https://issues.apache.org/jira/browse/ARROW-6962
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We should simply use {{-Wall}} instead.
> [https://quuxplusone.github.io/blog/2018/12/06/dont-use-weverything/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6984) [C++] Update LZ4 to 1.9.2 for CVE-2019-17543

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6984:
---
Fix Version/s: (was: 0.15.1)
   1.0.0

> [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
> 
>
> Key: ARROW-6984
> URL: https://issues.apache.org/jira/browse/ARROW-6984
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Sangeeth Keeriyadath
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a reported CVE that LZ4 before 1.9.2 has a heap-based buffer 
> overflow in LZ4_write32 (More details in here - 
> [https://nvd.nist.gov/vuln/detail/CVE-2019-17543] ). I see that Apache Arrow 
> uses *v1.8.3* version ( 
> [https://github.com/apache/arrow/blob/47e5ecafa72b70112a64a1174b29b9db45f803ef/cpp/thirdparty/versions.txt#L38]
>  ).
> We need to bump up the dependency version of LZ4 to *1.9.2* to get past the 
> reported CVE. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6986) [R] Add basic Expression class

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6986:
--
Labels: pull-request-available  (was: )

> [R] Add basic Expression class
> --
>
> Key: ARROW-6986
> URL: https://issues.apache.org/jira/browse/ARROW-6986
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> I started this as part of ARROW-6980 but it proved not necessary. This will 
> be a foundation for ARROW-6982, in addition to being useful on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6986) [R] Add basic Expression class

2019-10-24 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6986:
--

 Summary: [R] Add basic Expression class
 Key: ARROW-6986
 URL: https://issues.apache.org/jira/browse/ARROW-6986
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


I started this as part of ARROW-6980 but it proved not necessary. This will be 
a foundation for ARROW-6982, in addition to being useful on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6907) [C++][Plasma] Allow Plasma store to batch notifications to clients

2019-10-24 Thread Philipp Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-6907.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5626
[https://github.com/apache/arrow/pull/5626]

> [C++][Plasma] Allow Plasma store to batch notifications to clients
> --
>
> Key: ARROW-6907
> URL: https://issues.apache.org/jira/browse/ARROW-6907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Danyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6983) [C++] Threaded task group crashes sometimes

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6983:
---
Fix Version/s: 0.15.1

> [C++] Threaded task group crashes sometimes
> ---
>
> Key: ARROW-6983
> URL: https://issues.apache.org/jira/browse/ARROW-6983
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> You can give this a more descriptive title :)
> See discussion on ARROW-6977. 
> https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6963) [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6963:
---
Fix Version/s: 0.15.1

> [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from 
> travis builds
> -
>
> Key: ARROW-6963
> URL: https://issues.apache.org/jira/browse/ARROW-6963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Travis starts to fail more often during artefact deployment to GitHub 
> releases.
> Crossbow has a builtin command to upload the artifacts which is more reliable.
> All of the travis builds should use the crossbow script instead of relying on 
> travis's deployment feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6963) [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6963.

Fix Version/s: (was: 0.15.1)
   Resolution: Fixed

Issue resolved by pull request 5726
[https://github.com/apache/arrow/pull/5726]

> [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from 
> travis builds
> -
>
> Key: ARROW-6963
> URL: https://issues.apache.org/jira/browse/ARROW-6963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Travis starts to fail more often during artefact deployment to GitHub 
> releases.
> Crossbow has a builtin command to upload the artifacts which is more reliable.
> All of the travis builds should use the crossbow script instead of relying on 
> travis's deployment feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6963) [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6963:
---
Summary: [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts 
from travis builds  (was: [Packaging] Use crossbow's command to deploy 
artifacts from travis builds)

> [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from 
> travis builds
> -
>
> Key: ARROW-6963
> URL: https://issues.apache.org/jira/browse/ARROW-6963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Travis starts to fail more often during artefact deployment to GitHub 
> releases.
> Crossbow has a builtin command to upload the artifacts which is more reliable.
> All of the travis builds should use the crossbow script instead of relying on 
> travis's deployment feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6963) [Packaging] Use crossbow's command to deploy artifacts from travis builds

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-6963:
--

Assignee: Krisztian Szucs

> [Packaging] Use crossbow's command to deploy artifacts from travis builds
> -
>
> Key: ARROW-6963
> URL: https://issues.apache.org/jira/browse/ARROW-6963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Travis starts to fail more often during artefact deployment to GitHub 
> releases.
> Crossbow has a builtin command to upload the artifacts which is more reliable.
> All of the travis builds should use the crossbow script instead of relying on 
> travis's deployment feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6963) [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6963:
---
Fix Version/s: 0.15.1
   1.0.0

> [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from 
> travis builds
> -
>
> Key: ARROW-6963
> URL: https://issues.apache.org/jira/browse/ARROW-6963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Travis starts to fail more often during artefact deployment to GitHub 
> releases.
> Crossbow has a builtin command to upload the artifacts which is more reliable.
> All of the travis builds should use the crossbow script instead of relying on 
> travis's deployment feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6983) [C++] Threaded task group crashes sometimes

2019-10-24 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-6983.
-
Fix Version/s: (was: 0.15.1)
   Resolution: Fixed

Issue resolved by pull request 5724
[https://github.com/apache/arrow/pull/5724]

> [C++] Threaded task group crashes sometimes
> ---
>
> Key: ARROW-6983
> URL: https://issues.apache.org/jira/browse/ARROW-6983
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> You can give this a more descriptive title :)
> See discussion on ARROW-6977. 
> https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6977:
---
Fix Version/s: 0.15.1

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6977.

Fix Version/s: (was: 0.15.1)
   Resolution: Fixed

Issue resolved by pull request 5729
[https://github.com/apache/arrow/pull/5729]

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6985) [Python] Steadily increasing time to load file using read_parquet

2019-10-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6985:
---
Component/s: Python

> [Python] Steadily increasing time to load file using read_parquet
> -
>
> Key: ARROW-6985
> URL: https://issues.apache.org/jira/browse/ARROW-6985
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0, 0.14.0, 0.15.0
>Reporter: Casey
>Priority: Minor
>
> I've noticed that reading from parquet using pandas read_parquet function is 
> taking steadily longer with each invocation. I've seen the other ticket about 
> memory usage but I'm seeing no memory impact just steadily increasing read 
> time until I restart the python session.
> Below is some code to reproduce my results. I notice it's particularly bad on 
> wide matrices, especially using pyarrow==0.15.0
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> import pandas as pd
> import os
> import numpy as np
> import time
> file = "skinny_matrix.pq"
> if not os.path.isfile(file):
> mat = np.zeros((6000, 26000))
> mat.ravel()[::100] = np.random.randn(60 * 26000)
> df = pd.DataFrame(mat.T)
> table = pa.Table.from_pandas(df)
> pq.write_table(table, file)
> n_timings = 50
> timings = np.empty(n_timings)
> for i in range(n_timings):
> start = time.time()
> new_df = pd.read_parquet(file)
> end = time.time()
> timings[i] = end - start
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6985) [Python] Steadily increasing time to load file using read_parquet

2019-10-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6985:
---
Summary: [Python] Steadily increasing time to load file using read_parquet  
(was: Steadily increasing time to load file using read_parquet)

> [Python] Steadily increasing time to load file using read_parquet
> -
>
> Key: ARROW-6985
> URL: https://issues.apache.org/jira/browse/ARROW-6985
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0, 0.15.0
>Reporter: Casey
>Priority: Minor
>
> I've noticed that reading from parquet using pandas read_parquet function is 
> taking steadily longer with each invocation. I've seen the other ticket about 
> memory usage but I'm seeing no memory impact just steadily increasing read 
> time until I restart the python session.
> Below is some code to reproduce my results. I notice it's particularly bad on 
> wide matrices, especially using pyarrow==0.15.0
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> import pandas as pd
> import os
> import numpy as np
> import time
> file = "skinny_matrix.pq"
> if not os.path.isfile(file):
> mat = np.zeros((6000, 26000))
> mat.ravel()[::100] = np.random.randn(60 * 26000)
> df = pd.DataFrame(mat.T)
> table = pa.Table.from_pandas(df)
> pq.write_table(table, file)
> n_timings = 50
> timings = np.empty(n_timings)
> for i in range(n_timings):
> start = time.time()
> new_df = pd.read_parquet(file)
> end = time.time()
> timings[i] = end - start
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6985) Steadily increasing time to load file using read_parquet

2019-10-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6985:
---
Fix Version/s: (was: 0.15.0)
   (was: 0.14.0)
   (was: 0.13.0)

> Steadily increasing time to load file using read_parquet
> 
>
> Key: ARROW-6985
> URL: https://issues.apache.org/jira/browse/ARROW-6985
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0, 0.15.0
>Reporter: Casey
>Priority: Minor
>
> I've noticed that reading from parquet using pandas read_parquet function is 
> taking steadily longer with each invocation. I've seen the other ticket about 
> memory usage but I'm seeing no memory impact just steadily increasing read 
> time until I restart the python session.
> Below is some code to reproduce my results. I notice it's particularly bad on 
> wide matrices, especially using pyarrow==0.15.0
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> import pandas as pd
> import os
> import numpy as np
> import time
> file = "skinny_matrix.pq"
> if not os.path.isfile(file):
> mat = np.zeros((6000, 26000))
> mat.ravel()[::100] = np.random.randn(60 * 26000)
> df = pd.DataFrame(mat.T)
> table = pa.Table.from_pandas(df)
> pq.write_table(table, file)
> n_timings = 50
> timings = np.empty(n_timings)
> for i in range(n_timings):
> start = time.time()
> new_df = pd.read_parquet(file)
> end = time.time()
> timings[i] = end - start
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6340) [R] Implements low-level bindings to Dataset classes

2019-10-24 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6340:
---
Description: 
The following classes should be accessible from R:

 * class DataSource
 * class DataSourceDiscovery
 * class Dataset
 * class ScanContext, ScanOptions, ScanTask
 * class ScannerBuilder
 * class Scanner

The end result is reading a directory of parquet files as a single stream. One 
should be able to re-implement [https://github.com/apache/arrow/pull/5720] in R.

  was:
The following classes should be accessible from R:

* class DataSource
* class DataFragment
* function DiscoverySource
* class ScanContext, ScanOptions, ScanTask
* class Dataset
* class ScannerBuilder
* class Scanner

The end result is reading a directory of parquet files as a single stream


> [R] Implements low-level bindings to Dataset classes
> 
>
> Key: ARROW-6340
> URL: https://issues.apache.org/jira/browse/ARROW-6340
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Francois Saint-Jacques
>Assignee: Romain Francois
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following classes should be accessible from R:
>  * class DataSource
>  * class DataSourceDiscovery
>  * class Dataset
>  * class ScanContext, ScanOptions, ScanTask
>  * class ScannerBuilder
>  * class Scanner
> The end result is reading a directory of parquet files as a single stream. 
> One should be able to re-implement 
> [https://github.com/apache/arrow/pull/5720] in R.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6984) [C++] Update LZ4 to 1.9.2 for CVE-2019-17543

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-6984:
--

Assignee: Krisztian Szucs

> [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
> 
>
> Key: ARROW-6984
> URL: https://issues.apache.org/jira/browse/ARROW-6984
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Sangeeth Keeriyadath
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a reported CVE that LZ4 before 1.9.2 has a heap-based buffer 
> overflow in LZ4_write32 (More details in here - 
> [https://nvd.nist.gov/vuln/detail/CVE-2019-17543] ). I see that Apache Arrow 
> uses *v1.8.3* version ( 
> [https://github.com/apache/arrow/blob/47e5ecafa72b70112a64a1174b29b9db45f803ef/cpp/thirdparty/versions.txt#L38]
>  ).
> We need to bump up the dependency version of LZ4 to *1.9.2* to get past the 
> reported CVE. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-24 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-6977:
-

Assignee: Antoine Pitrou

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6983) [C++] Threaded task group crashes sometimes

2019-10-24 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6983:
--
Fix Version/s: 1.0.0

> [C++] Threaded task group crashes sometimes
> ---
>
> Key: ARROW-6983
> URL: https://issues.apache.org/jira/browse/ARROW-6983
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> You can give this a more descriptive title :)
> See discussion on ARROW-6977. 
> https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6977:
--
Labels: pull-request-available  (was: )

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6961) [C++][Gandiva] Add lower_utf8 function in Gandiva

2019-10-24 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6961.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5712
[https://github.com/apache/arrow/pull/5712]

> [C++][Gandiva] Add lower_utf8 function in Gandiva
> -
>
> Key: ARROW-6961
> URL: https://issues.apache.org/jira/browse/ARROW-6961
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Function signature is {{utf8 lower(utf8)}}. Converts an utf8 sequence to 
> lower case.
> This handles only ascii characters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6984) [C++] Update LZ4 to 1.9.2 for CVE-2019-17543

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6984:
--
Labels: pull-request-available  (was: )

> [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
> 
>
> Key: ARROW-6984
> URL: https://issues.apache.org/jira/browse/ARROW-6984
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Sangeeth Keeriyadath
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>
> There is a reported CVE that LZ4 before 1.9.2 has a heap-based buffer 
> overflow in LZ4_write32 (More details in here - 
> [https://nvd.nist.gov/vuln/detail/CVE-2019-17543] ). I see that Apache Arrow 
> uses *v1.8.3* version ( 
> [https://github.com/apache/arrow/blob/47e5ecafa72b70112a64a1174b29b9db45f803ef/cpp/thirdparty/versions.txt#L38]
>  ).
> We need to bump up the dependency version of LZ4 to *1.9.2* to get past the 
> reported CVE. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6984) [C++] Update LZ4 to 1.9.2 for CVE-2019-17543

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6984:
---
Summary: [C++] Update LZ4 to 1.9.2 for CVE-2019-17543  (was: Update LZ4 to 
1.9.2 for CVE-2019-17543)

> [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
> 
>
> Key: ARROW-6984
> URL: https://issues.apache.org/jira/browse/ARROW-6984
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.15.0
>Reporter: Sangeeth Keeriyadath
>Priority: Major
> Fix For: 0.15.1
>
>
> There is a reported CVE that LZ4 before 1.9.2 has a heap-based buffer 
> overflow in LZ4_write32 (More details in here - 
> [https://nvd.nist.gov/vuln/detail/CVE-2019-17543] ). I see that Apache Arrow 
> uses *v1.8.3* version ( 
> [https://github.com/apache/arrow/blob/47e5ecafa72b70112a64a1174b29b9db45f803ef/cpp/thirdparty/versions.txt#L38]
>  ).
> We need to bump up the dependency version of LZ4 to *1.9.2* to get past the 
> reported CVE. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6825) [C++] Rework CSV reader IO around readahead iterator

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6825:
--
Labels: pull-request-available  (was: )

> [C++] Rework CSV reader IO around readahead iterator
> 
>
> Key: ARROW-6825
> URL: https://issues.apache.org/jira/browse/ARROW-6825
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> Following ARROW-6764, we should try to remove the custom ReadaheadSpooler and 
> use the generic readahead iteration facility instead. This will require 
> reworking the blocking / chunking logic to mimick what is done in the JSON 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6952) [C++][Dataset] Ensure expression filter is passed ParquetDataFragment

2019-10-24 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-6952:
---

Assignee: Ben Kietzman

> [C++][Dataset] Ensure expression filter is passed ParquetDataFragment
> -
>
> Key: ARROW-6952
> URL: https://issues.apache.org/jira/browse/ARROW-6952
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: dataset
>
> We should be able to prune RowGroups based on the expression and the 
> statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6299) [C++] Simplify FileFormat classes to singletons

2019-10-24 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman closed ARROW-6299.
---
Resolution: Won't Do

> [C++] Simplify FileFormat classes to singletons
> ---
>
> Key: ARROW-6299
> URL: https://issues.apache.org/jira/browse/ARROW-6299
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Minor
>  Labels: dataset
> Fix For: 1.0.0
>
>
> ParquetFileFormat has no state, so passing it around by 
> shared_ptr is not necessary; we could just keep a single static 
> instance and pass raw pointers.
> [~wesmckinn] is there a case where a FileFormat might have state?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6299) [C++] Simplify FileFormat classes to singletons

2019-10-24 Thread Ben Kietzman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958951#comment-16958951
 ] 

Ben Kietzman commented on ARROW-6299:
-

In the case of CSV, it's most natural to consider files with a comma separator 
and files with a tab separator as different formats.

> [C++] Simplify FileFormat classes to singletons
> ---
>
> Key: ARROW-6299
> URL: https://issues.apache.org/jira/browse/ARROW-6299
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Minor
>  Labels: dataset
> Fix For: 1.0.0
>
>
> ParquetFileFormat has no state, so passing it around by 
> shared_ptr is not necessary; we could just keep a single static 
> instance and pass raw pointers.
> [~wesmckinn] is there a case where a FileFormat might have state?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6964) [C++][Dataset] Expose a nested parallel option for Scanner::ToTable

2019-10-24 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-6964:

Summary: [C++][Dataset] Expose a nested parallel option for 
Scanner::ToTable  (was: [C++][Dataset] Expose a nested parellel option for 
Scanner::ToTable)

> [C++][Dataset] Expose a nested parallel option for Scanner::ToTable
> ---
>
> Key: ARROW-6964
> URL: https://issues.apache.org/jira/browse/ARROW-6964
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6985) Steadily increasing time to load file using read_parquet

2019-10-24 Thread Casey (Jira)
Casey created ARROW-6985:


 Summary: Steadily increasing time to load file using read_parquet
 Key: ARROW-6985
 URL: https://issues.apache.org/jira/browse/ARROW-6985
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.15.0, 0.14.0, 0.13.0
Reporter: Casey
 Fix For: 0.15.0, 0.14.0, 0.13.0


I've noticed that reading from parquet using pandas read_parquet function is 
taking steadily longer with each invocation. I've seen the other ticket about 
memory usage but I'm seeing no memory impact just steadily increasing read time 
until I restart the python session.

Below is some code to reproduce my results. I notice it's particularly bad on 
wide matrices, especially using pyarrow==0.15.0
{code:python}
import pyarrow.parquet as pq
import pyarrow as pa
import pandas as pd
import os
import numpy as np
import time

file = "skinny_matrix.pq"

if not os.path.isfile(file):
mat = np.zeros((6000, 26000))
mat.ravel()[::100] = np.random.randn(60 * 26000)
df = pd.DataFrame(mat.T)
table = pa.Table.from_pandas(df)
pq.write_table(table, file)

n_timings = 50
timings = np.empty(n_timings)
for i in range(n_timings):
start = time.time()
new_df = pd.read_parquet(file)
end = time.time()
timings[i] = end - start
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6963) [Packaging] Use crossbow's command to deploy artifacts from travis builds

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6963:
--
Labels: pull-request-available  (was: )

> [Packaging] Use crossbow's command to deploy artifacts from travis builds
> -
>
> Key: ARROW-6963
> URL: https://issues.apache.org/jira/browse/ARROW-6963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Travis starts to fail more often during artefact deployment to GitHub 
> releases.
> Crossbow has a builtin command to upload the artifacts which is more reliable.
> All of the travis builds should use the crossbow script instead of relying on 
> travis's deployment feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6704) [C++] Cast from timestamp to higher resolution does not check out of bounds timestamps

2019-10-24 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6704.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5623
[https://github.com/apache/arrow/pull/5623]

> [C++] Cast from timestamp to higher resolution does not check out of bounds 
> timestamps
> --
>
> Key: ARROW-6704
> URL: https://issues.apache.org/jira/browse/ARROW-6704
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When casting eg {{timestamp('s')}} to {{timestamp('ns')}}, we do not check 
> for out of bounds timestamps, giving "garbage" timestamps in the result:
> {code}
> In [74]: a_np = np.array(["2012-01-01", "2412-01-01"], dtype="datetime64[s]") 
>   
>
> In [75]: arr = pa.array(a_np) 
>   
>
> In [76]: arr  
>   
>
> Out[76]: 
> 
> [
>   2012-01-01 00:00:00,
>   2412-01-01 00:00:00
> ]
> In [77]: arr.cast(pa.timestamp('ns')) 
>   
>
> Out[77]: 
> 
> [
>   2012-01-01 00:00:00.0,
>   1827-06-13 00:25:26.290448384
> ]
> {code}
> Now, this is the same behaviour as numpy, so not sure we should do this. 
> However, since we have a {{safe=True/False}}, I would expect that for 
> {{safe=True}} we check this and for {{safe=False}} we do not check this.  
> (numpy has a similiar {{casting='safe'}} but also does not raise an error in 
> that case).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6341) [Python] Implement low-level bindings for Dataset

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6341:
--
Description: 
The following classes should be accessible from Python:
 * class DataSource
 * class DataSourceDiscovery
 * class Dataset
 * class ScanContext, ScanOptions, ScanTask
 * class ScannerBuilder
 * class Scanner

The end result is reading a directory of parquet files as a single stream. One 
should be able to re-implement [https://github.com/apache/arrow/pull/5720] in 
python.

  was:
The following classes should be accessible from Python:

* class DataSource
* class DataFragment
* function DiscoverySource
* class ScanContext, ScanOptions, ScanTask
* class Dataset
* class ScannerBuilder
* class Scanner

The end result is reading a directory of parquet files as a single stream.


> [Python] Implement low-level bindings for Dataset
> -
>
> Key: ARROW-6341
> URL: https://issues.apache.org/jira/browse/ARROW-6341
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Francois Saint-Jacques
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following classes should be accessible from Python:
>  * class DataSource
>  * class DataSourceDiscovery
>  * class Dataset
>  * class ScanContext, ScanOptions, ScanTask
>  * class ScannerBuilder
>  * class Scanner
> The end result is reading a directory of parquet files as a single stream. 
> One should be able to re-implement 
> [https://github.com/apache/arrow/pull/5720] in python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6984) Update LZ4 to 1.9.2 for CVE-2019-17543

2019-10-24 Thread Sangeeth Keeriyadath (Jira)
Sangeeth Keeriyadath created ARROW-6984:
---

 Summary: Update LZ4 to 1.9.2 for CVE-2019-17543
 Key: ARROW-6984
 URL: https://issues.apache.org/jira/browse/ARROW-6984
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.15.0
Reporter: Sangeeth Keeriyadath
 Fix For: 0.15.1


There is a reported CVE that LZ4 before 1.9.2 has a heap-based buffer overflow 
in LZ4_write32 (More details in here - 
[https://nvd.nist.gov/vuln/detail/CVE-2019-17543] ). I see that Apache Arrow 
uses *v1.8.3* version ( 
[https://github.com/apache/arrow/blob/47e5ecafa72b70112a64a1174b29b9db45f803ef/cpp/thirdparty/versions.txt#L38]
 ).

We need to bump up the dependency version of LZ4 to *1.9.2* to get past the 
reported CVE. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6970) [Packaging][RPM] Add support for CentOS 8

2019-10-24 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-6970.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5715
[https://github.com/apache/arrow/pull/5715]

> [Packaging][RPM] Add support for CentOS 8
> -
>
> Key: ARROW-6970
> URL: https://issues.apache.org/jira/browse/ARROW-6970
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6948) [Rust] [Parquet] Fix bool array support in arrow reader.

2019-10-24 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6948.

Resolution: Fixed

Issue resolved by pull request 5705
[https://github.com/apache/arrow/pull/5705]

> [Rust] [Parquet] Fix bool array support in arrow reader.
> 
>
> Key: ARROW-6948
> URL: https://issues.apache.org/jira/browse/ARROW-6948
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6969:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++][Dataset] ParquetScanTask eagerly load file 
> -
>
> Key: ARROW-6969
> URL: https://issues.apache.org/jira/browse/ARROW-6969
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>
> The file content should only be read when invoking ParquetScanTask::Scan, not 
> on construction. This blocks reading in a true streaming fashion with memory 
> constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6925) [C++] Arrow fails to buld on MacOS 10.13.6 using brew gcc 7 and 8

2019-10-24 Thread Francois Saint-Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958862#comment-16958862
 ] 

Francois Saint-Jacques commented on ARROW-6925:
---

Noted, I found out how to do it from now on.

> [C++] Arrow fails to buld on MacOS 10.13.6 using brew gcc 7 and 8
> -
>
> Key: ARROW-6925
> URL: https://issues.apache.org/jira/browse/ARROW-6925
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: MacOS 10.13.6 using both brew gcc 7 and 8.
>Reporter: John Norris
>Assignee: John Norris
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Both SetupCxxFlags.cmake and ThirdpartyToolchain.cmake add -stdlib=libc++ to 
> the compiler flags when APPLE is true, but if you're using GCC from brew (or 
> presumably from anywhere other that Apple), this flag is not recognized and 
> the build fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6949) [Java] Fix promotable write to handle nullvectors

2019-10-24 Thread Praveen Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Kumar resolved ARROW-6949.
--
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5698
[https://github.com/apache/arrow/pull/5698]

> [Java] Fix promotable write to handle nullvectors
> -
>
> Key: ARROW-6949
> URL: https://issues.apache.org/jira/browse/ARROW-6949
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6983) [C++] Threaded task group crashes sometimes

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6983:
--
Labels: pull-request-available  (was: )

> [C++] Threaded task group crashes sometimes
> ---
>
> Key: ARROW-6983
> URL: https://issues.apache.org/jira/browse/ARROW-6983
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>
> You can give this a more descriptive title :)
> See discussion on ARROW-6977. 
> https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6944) [Rust] Add StringType

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6944:
--
Labels: pull-request-available  (was: )

> [Rust] Add StringType
> -
>
> Key: ARROW-6944
> URL: https://issues.apache.org/jira/browse/ARROW-6944
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
>
> Create a separate String type which uses UTF8, and restrict the BinaryArray 
> to opaque binary data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6948) [Rust] [Parquet] Fix bool array support in arrow reader.

2019-10-24 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-6948:
--
Fix Version/s: 1.0.0

> [Rust] [Parquet] Fix bool array support in arrow reader.
> 
>
> Key: ARROW-6948
> URL: https://issues.apache.org/jira/browse/ARROW-6948
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6948) [Rust] [Parquet] Fix bool array support in arrow reader.

2019-10-24 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-6948:
--
Component/s: Rust

> [Rust] [Parquet] Fix bool array support in arrow reader.
> 
>
> Key: ARROW-6948
> URL: https://issues.apache.org/jira/browse/ARROW-6948
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)