[jira] [Created] (ARROW-7840) [Java] [Integration] Java executables fail

2020-02-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7840:
-

 Summary: [Java] [Integration] Java executables fail
 Key: ARROW-7840
 URL: https://issues.apache.org/jira/browse/ARROW-7840
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration, Java
Reporter: Antoine Pitrou
 Fix For: 1.0.0


When trying to run integration tests using {{docker-compose run 
conda-integration}}, I always get failures during the Java tests:
{code}
RuntimeError: Command failed: ['java', 
'-Dio.netty.tryReflectionSetAccessible=true', '-cp', 
'/arrow/java/tools/target/arrow-tools-1.0.0-SNAPSHOT-jar-with-dependencies.jar',
 'org.apache.arrow.tools.StreamToFile', 
'/tmp/tmpqbkrmpo1/e75ed336_simple.producer_file_as_stream', 
'/tmp/tmpqbkrmpo1/e75ed336_simple.consumer_stream_as_file']
With output:
--
15:57:01.194 [main] DEBUG io.netty.util.internal.logging.InternalLoggerFactory 
- Using SLF4J as the default logging framework
15:57:01.196 [main] DEBUG io.netty.util.ResourceLeakDetector - 
-Dio.netty.leakDetection.level: simple
15:57:01.196 [main] DEBUG io.netty.util.ResourceLeakDetector - 
-Dio.netty.leakDetection.targetRecords: 4
15:57:01.208 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
-Dio.netty.noUnsafe: false
15:57:01.209 [main] DEBUG io.netty.util.internal.PlatformDependent0 - Java 
version: 8
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
sun.misc.Unsafe.theUnsafe: available
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
sun.misc.Unsafe.copyMemory: available
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
java.nio.Buffer.address: available
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - direct 
buffer constructor: available
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
java.nio.Bits.unaligned: available, true
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior to 
Java9
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
java.nio.DirectByteBuffer.(long, int): available
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
sun.misc.Unsafe: available
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.tmpdir: /tmp (java.io.tmpdir)
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.bitMode: 64 (sun.arch.data.model)
15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.noPreferDirect: false
15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.maxDirectMemory: 11252269056 bytes
15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.uninitializedArrayAllocationThreshold: -1
15:57:01.213 [main] DEBUG io.netty.util.internal.CleanerJava6 - 
java.nio.ByteBuffer.cleaner(): available
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.numHeapArenas: 48
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.numDirectArenas: 48
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.pageSize: 8192
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.maxOrder: 11
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.chunkSize: 16777216
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.tinyCacheSize: 512
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.smallCacheSize: 256
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.normalCacheSize: 64
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.maxCachedBufferCapacity: 32768
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.cacheTrimInterval: 8192
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.useCacheForAllThreads: true
15:57:01.216 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - 
-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
15:57:01.216 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - 
-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
15:57:01.228 [main] DEBUG io.netty.buffer.AbstractByteBuf - 
-Dio.netty.buffer.bytebuf.checkAccessible: true
15:57:01.228 [main] DEBUG io.netty.util.ResourceLeakDetectorFactory - Loaded 
default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@71bc1ae4
15:57:01.242 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel - Reading 
buffer with size: 4
15:57:01.242 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel - Reading 
buffer with size: 4
15:57:01.242 [main] DEBUG 

[jira] [Created] (ARROW-7944) [Python] Test failures without Pandas

2020-02-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7944:
-

 Summary: [Python] Test failures without Pandas
 Key: ARROW-7944
 URL: https://issues.apache.org/jira/browse/ARROW-7944
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
 Fix For: 1.0.0


I recently saw this:
https://ci.appveyor.com/project/pitrou/arrow/builds/31065781/job/p08i1nrstf9wl2kr#L1964

{code}
== FAILURES ===
_ test_builtin_pickle_dataset _
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
def test_builtin_pickle_dataset(tempdir, datadir):
import pickle
>   dataset = _make_dataset_for_pickling(tempdir)
pyarrow\tests\test_parquet.py:2821: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_builtin_pickle_dataset0')
N = 100
def _make_dataset_for_pickling(tempdir, N=100):
path = tempdir / 'data.parquet'
fs = LocalFileSystem.get_instance()

>   df = pd.DataFrame({
'index': np.arange(N),
'values': np.random.randn(N)
}, columns=['index', 'values'])
E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
pyarrow\tests\test_parquet.py:2776: AttributeError
__ test_cloudpickle_dataset ___
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
datadir = WindowsPath('c:/projects/arrow/python/pyarrow/tests/data/parquet')
def test_cloudpickle_dataset(tempdir, datadir):
cp = pytest.importorskip('cloudpickle')
>   dataset = _make_dataset_for_pickling(tempdir)
pyarrow\tests\test_parquet.py:2827: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tempdir = 
WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_cloudpickle_dataset0')
N = 100
def _make_dataset_for_pickling(tempdir, N=100):
path = tempdir / 'data.parquet'
fs = LocalFileSystem.get_instance()

>   df = pd.DataFrame({
'index': np.arange(N),
'values': np.random.randn(N)
}, columns=['index', 'values'])
E   AttributeError: 'NoneType' object has no attribute 'DataFrame'
pyarrow\tests\test_parquet.py:2776: AttributeError
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7999) [C++] Fix crash on corrupt Map array input (OSS-Fuzz)

2020-03-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7999:
-

 Summary: [C++] Fix crash on corrupt Map array input (OSS-Fuzz)
 Key: ARROW-7999
 URL: https://issues.apache.org/jira/browse/ARROW-7999
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7982) [C++] Let ArrayDataVisitor accept void-returning functions

2020-03-02 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7982:
-

 Summary: [C++] Let ArrayDataVisitor accept void-returning functions
 Key: ARROW-7982
 URL: https://issues.apache.org/jira/browse/ARROW-7982
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


It would be nice if {{ArrayDataVisitor}} accepted a visitor struct with 
void-returning (instead of Status-returning) methods. Always-ok Status may not 
be entirely optimized away by the compiler in some situations.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7995) [C++] IO: coalescing and caching read ranges

2020-03-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7995:
-

 Summary: [C++] IO: coalescing and caching read ranges
 Key: ARROW-7995
 URL: https://issues.apache.org/jira/browse/ARROW-7995
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


This will be useful in order to improve Parquet reading performance on remote / 
high-latency filesystems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7994) [CI][C++] Move AppVeyor MinGW builds to Github Actions

2020-03-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7994:
-

 Summary: [CI][C++] Move AppVeyor MinGW builds to Github Actions
 Key: ARROW-7994
 URL: https://issues.apache.org/jira/browse/ARROW-7994
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Continuous Integration, Ruby
Reporter: Antoine Pitrou


To lighten a bit the load on AppVeyor (where we often have queues building up), 
it would be nice to move the MinGW builds to Github Actions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8011) [C++] Some buffers not resized when reading from Parquet

2020-03-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8011:
-

 Summary: [C++] Some buffers not resized when reading from Parquet
 Key: ARROW-8011
 URL: https://issues.apache.org/jira/browse/ARROW-8011
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


This may leak uninitialized data:
{code:python}
>>> table = pa.Table.from_pydict({"a": pa.array([0, None, None])})  
>>> 
>>>   
>>> table.column("a").chunk(0).buffers()[1].to_pybytes()
>>> 
>>>   
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> bio = io.BytesIO()  
>>> 
>>>   
>>> pq.write_table(table, bio, use_dictionary=False)
>>> 
>>>   
>>> bio.seek(0) 
>>> 
>>>   
0
>>> table = pq.read_table(bio)  
>>> 
>>>   
>>> table.column("a").chunk(0).buffers()[1].to_pybytes()
>>> 
>>>   
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7930) [Python][CI] Test jpype integration in CI

2020-02-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7930:
-

 Summary: [Python][CI] Test jpype integration in CI
 Key: ARROW-7930
 URL: https://issues.apache.org/jira/browse/ARROW-7930
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration, Python
Reporter: Antoine Pitrou


We used to test jpype integration on Travis-CI, but this wasn't transferred to 
the GHA setup.

Perhaps we need a nightly build or crossbow task for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7931) [C++] Fix crash on corrupt Map array input (OSS-Fuzz)

2020-02-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7931:
-

 Summary: [C++] Fix crash on corrupt Map array input (OSS-Fuzz)
 Key: ARROW-7931
 URL: https://issues.apache.org/jira/browse/ARROW-7931
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7948) [Go][Integration] Decimal integration failures

2020-02-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7948:
-

 Summary: [Go][Integration] Decimal integration failures
 Key: ARROW-7948
 URL: https://issues.apache.org/jira/browse/ARROW-7948
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go, Integration
Reporter: Antoine Pitrou


If I un-skip decimal data for integrations tests with Go, I get some errors 
such as:
{code}
==
Testing file /tmp/tmpkz1_ydgp/generated_decimal.json
==
-- Creating binary inputs
-- Validating file
Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/util.py", line 130, in run_cmd
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 
'['/opt/go/bin/arrow-json-integration-test', '-arrow', 
'/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', '-json', 
'/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']' returned 
non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/runner.py", line 193, in 
_run_ipc_test_case
run_binaries(producer, consumer, outcome, test_case)
  File "/arrow/dev/archery/archery/integration/runner.py", line 219, in 
_produce_consume
consumer.validate(json_path, producer_file_path)
  File "/arrow/dev/archery/archery/integration/tester_go.py", line 55, in 
validate
return self._run(arrow_path, json_path, 'VALIDATE')
  File "/arrow/dev/archery/archery/integration/tester_go.py", line 52, in _run
run_cmd(cmd)
  File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
raise RuntimeError(sio.getvalue())
RuntimeError: Command failed: ['/opt/go/bin/arrow-json-integration-test', 
'-arrow', '/tmp/tmpj9b4jgvi/10cae236_generated_decimal.json_as_file', '-json', 
'/tmp/tmpkz1_ydgp/generated_decimal.json', '-mode', 'VALIDATE']
With output:
--
arrow-json: could not open JSON file reader from file 
"/tmp/tmpkz1_ydgp/generated_decimal.json": json: cannot unmarshal number into 
Go struct field dataType.precision of type string

--
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7669) [CI] [C++] Turn optimizations off on AppVeyor

2020-01-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7669:
-

 Summary: [CI] [C++] Turn optimizations off on AppVeyor
 Key: ARROW-7669
 URL: https://issues.apache.org/jira/browse/ARROW-7669
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


Some AppVeyor entries build in release mode (for example MinGW, R). We should 
try to turn compiler optimizations off (e.g. by passing a CXXFLAGS) to make 
those entries faster.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7689) [C++] Sporadic Flight test crash on macOS

2020-01-27 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7689:
-

 Summary: [C++] Sporadic Flight test crash on macOS
 Key: ARROW-7689
 URL: https://issues.apache.org/jira/browse/ARROW-7689
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: Antoine Pitrou


See this build:
https://github.com/apache/arrow/pull/6288/checks?check_run_id=409993893

{code}
[--] 2 tests from TestTls
[ RUN  ] TestTls.DoAction
E0127 01:40:23.87112 123145508859904 tls_pthread.cc:26]
assertion failed: 0 == pthread_setspecific(tls->key, (void*)value)
/Users/runner/runners/2.164.0/work/arrow/arrow/cpp/build-support/run-test.sh: 
line 97: 32496 Abort trap: 6   $TEST_EXECUTABLE "$@" 2>&1
 32497 Done| $ROOT/build-support/asan_symbolize.py
 32498 Done| ${CXXFILT:-c++filt}
 32499 Done| 
$ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
 32500 Done| $pipe_cmd 2>&1
 32501 Done| tee $LOGFILE
~/runners/2.164.0/work/arrow/arrow/build/cpp/src/arrow/flight
{code}

This is a gRPC issue, reported here:
https://github.com/grpc/grpc/issues/20311

We should try to bump bundled gRPC version to see if that fixes the issue.

Side note: why aren't we using the homebrew-provided gRPC?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7784) [C++] diff.cc is extremely slow to compile

2020-02-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7784:
-

 Summary: [C++] diff.cc is extremely slow to compile
 Key: ARROW-7784
 URL: https://issues.apache.org/jira/browse/ARROW-7784
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


This comes up especially when doing an optimized build. {{diff.cc}} is always 
enabled even if all components are disabled, and it takes multiple seconds to 
compile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7785) [C++] sparse_tensor.cc is extremely slow to compile

2020-02-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7785:
-

 Summary: [C++] sparse_tensor.cc is extremely slow to compile
 Key: ARROW-7785
 URL: https://issues.apache.org/jira/browse/ARROW-7785
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


This comes up especially when doing an optimized build. {{sparse_tensor.cc}} is 
always enabled even if all components are disabled, and it takes multiple 
seconds to compile.

Using [CLangBuildAnalyzer|https://github.com/aras-p/ClangBuildAnalyzer] I get 
the following results:
{code}
 Files that took longest to codegen (compiler backend):
 66372 ms: 
build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
 16457 ms: 
build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
  6283 ms: build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
  5284 ms: 
build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
  5090 ms: 
build-clang-profile/src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7783) [C++] ARROW_DATASET should enable ARROW_COMPUTE

2020-02-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7783:
-

 Summary: [C++] ARROW_DATASET should enable ARROW_COMPUTE
 Key: ARROW-7783
 URL: https://issues.apache.org/jira/browse/ARROW-7783
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


Currenty, passing {{-DARROW_DATASET=ON}} to CMake doesn't enable ARROW_COMPUTE, 
which leads to linker errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7749) [C++] Link some tests together

2020-02-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7749:
-

 Summary: [C++] Link some tests together
 Key: ARROW-7749
 URL: https://issues.apache.org/jira/browse/ARROW-7749
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


With unity builds (ARROW-7725) it may become more beneficial to reduce the 
number of test executables, as several C++ files could be compiled together so 
as to reduce build times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7748) [C++] [Cuda] Cache CUDA contexts

2020-02-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7748:
-

 Summary: [C++] [Cuda] Cache CUDA contexts
 Key: ARROW-7748
 URL: https://issues.apache.org/jira/browse/ARROW-7748
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, GPU
Reporter: Antoine Pitrou


CUDA contexts can be expensive to instantiate, maybe we should cache them 
(probably only the primary contexts).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7754) [C++] Result is slow

2020-02-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7754:
-

 Summary: [C++] Result is slow
 Key: ARROW-7754
 URL: https://issues.apache.org/jira/browse/ARROW-7754
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


When converting a short performance-critical function to return a Result 
(instead of returning a Status and filling a out-parameter), I noticed a 
catastrophic performance regression (around 2x or 3x slower).

It seems the current Result implementation is very slow, for several reasons:
- it imposes "safety" features even in release mode, for example on the 
critical path of move operators
- the underlying mpark variant implementation is not optimized for 
performance-critical data structures




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7665) [R] linuxLibs.R should build in parallel

2020-01-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7665:
-

 Summary: [R] linuxLibs.R should build in parallel
 Key: ARROW-7665
 URL: https://issues.apache.org/jira/browse/ARROW-7665
 Project: Apache Arrow
  Issue Type: Wish
  Components: R
Reporter: Antoine Pitrou


It currently seems to compile everything in one thread, which is ghastinly slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7701) [C++] [CI] Flight test error on macOS

2020-01-28 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7701:
-

 Summary: [C++] [CI] Flight test error on macOS
 Key: ARROW-7701
 URL: https://issues.apache.org/jira/browse/ARROW-7701
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration, FlightRPC
Reporter: Antoine Pitrou


See e.g. https://github.com/apache/arrow/pull/6295/checks?check_run_id=412748673

{code}
[ RUN  ] TestTls.DoAction
E0128 12:02:52.140841000 4447722944 ssl_security_connector.cc:275] 
Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0128 12:02:52.14259 4447722944 server_secure_chttp2.cc:81]
{"created":"@1580212972.142576000","description":"Unable to create secure 
server with credentials of type 
Ssl.","file":"/Users/runner/runners/2.164.0/work/arrow/arrow/build/cpp/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/chttp2/server/secure/server_secure_chttp2.cc","file_line":63}
/Users/runner/runners/2.164.0/work/arrow/arrow/cpp/build-support/run-test.sh: 
line 97: 32477 Segmentation fault: 11  $TEST_EXECUTABLE "$@" 2>&1
 32478 Done| $ROOT/build-support/asan_symbolize.py
 32479 Done| ${CXXFILT:-c++filt}
 32480 Done| 
$ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
 32481 Done| $pipe_cmd 2>&1
 32482 Done| tee $LOGFILE
~/runners/2.164.0/work/arrow/arrow/build/cpp/src/arrow/flight
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7726) [CI] [C++] Use boost binaries on Windows GHA build

2020-01-30 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7726:
-

 Summary: [CI] [C++] Use boost binaries on Windows GHA build
 Key: ARROW-7726
 URL: https://issues.apache.org/jira/browse/ARROW-7726
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


In the Github Actions "AMD64 Windows 2019 C++" build, around 10 minutes are 
spent compiling the bundled Boost library from source. We should probably find 
a way to reuse some existing binaries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7725) [C++] Add infrastructure for unity builds and precompiled headers

2020-01-30 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7725:
-

 Summary: [C++] Add infrastructure for unity builds and precompiled 
headers
 Key: ARROW-7725
 URL: https://issues.apache.org/jira/browse/ARROW-7725
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Unity builds and precompiled headers can be enabled "easily" in CMake 3.16:
https://cmake.org/cmake/help/v3.16/prop_tgt/UNITY_BUILD.html
https://cmake.org/cmake/help/v3.16/command/target_precompile_headers.html

They can make builds faster in some conditions, especially on CI with little 
parallelism and caching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7691) [C++] Verify missing fields when walking Flatbuffers data

2020-01-27 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7691:
-

 Summary: [C++] Verify missing fields when walking Flatbuffers data
 Key: ARROW-7691
 URL: https://issues.apache.org/jira/browse/ARROW-7691
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.15.1
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This will fix some of the issues detected by OSS-Fuzz.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7869) [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels

2020-02-17 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7869:
-

 Summary: [Python] Boost::system and boost::filesystem not 
necessary anymore in Python wheels
 Key: ARROW-7869
 URL: https://issues.apache.org/jira/browse/ARROW-7869
 Project: Apache Arrow
  Issue Type: Task
  Components: Packaging, Python
Reporter: Antoine Pitrou
 Fix For: 1.0.0


Unfortunately it seems we still need boost::regex due to Parquet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7890) [C++] Add Promise / Future implementation

2020-02-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7890:
-

 Summary: [C++] Add Promise / Future implementation
 Key: ARROW-7890
 URL: https://issues.apache.org/jira/browse/ARROW-7890
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{{std::future}} is unfortunately not featureful enough: there is no way to wait 
on several futures at once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7911) [C++] Gandiva tests crash when compiled with clang

2020-02-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7911:
-

 Summary: [C++] Gandiva tests crash when compiled with clang
 Key: ARROW-7911
 URL: https://issues.apache.org/jira/browse/ARROW-7911
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Antoine Pitrou


Recently, Gandiva tests have started to crash when compiled with clang 7.0:
{code}
clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
{code}

The same crashes occur with clang 9.0:
{code}
clang version 9.0.0-2~ubuntu18.04.2 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
{code}

Tests run fine with gcc 7.4.0, though:
{code}
gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7915) [CI] [Python] Run tests with Python development mode enabled

2020-02-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7915:
-

 Summary: [CI] [Python] Run tests with Python development mode 
enabled
 Key: ARROW-7915
 URL: https://issues.apache.org/jira/browse/ARROW-7915
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Python
Reporter: Antoine Pitrou


Python's "development mode" enable a few runtime checks and warnings, see the 
docs for "{{-X dev}}": https://docs.python.org/3/using/cmdline.html#id5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7884) [C++][Python] Crash in pq.read_table()

2020-02-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7884:
-

 Summary: [C++][Python] Crash in pq.read_table()
 Key: ARROW-7884
 URL: https://issues.apache.org/jira/browse/ARROW-7884
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Antoine Pitrou


The following crashes:
{code:python}
>>> import pyarrow.parquet as pq
>>> 
>>> 
>>> tab = 
>>> pq.read_table("../cpp/submodules/parquet-testing/data/nation.dict-malformed.parquet")
{code}

Here is the backtrace:
{code}
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x77805801 in __GI_abort () at abort.c:79
#2  0x7fffb8a18e42 in arrow::util::CerrLog::~CerrLog (this=0x7fff84001690, 
__in_chrg=) at ../src/arrow/util/logging.cc:50
#3  0x7fffb8a18e5e in arrow::util::CerrLog::~CerrLog (this=0x7fff84001690, 
__in_chrg=) at ../src/arrow/util/logging.cc:52
#4  0x7fffb8a18c9f in arrow::util::ArrowLog::~ArrowLog 
(this=0x7fffaeffbf60, __in_chrg=) at 
../src/arrow/util/logging.cc:228
#5  0x7fffb89e1607 in 
arrow::io::internal::SharedExclusiveChecker::LockExclusive 
(this=0x55db1338) at ../src/arrow/io/interfaces.cc:287
#6  0x7fffb89b0b10 in 
arrow::io::internal::ExclusiveLockGuard::ExclusiveLockGuard
 (this=0x7fffaeffbff8, 
lock=0x55db1338) at ../src/arrow/io/concurrency.h:47
#7  0x7fffb89ad20f in 
arrow::io::internal::SharedExclusiveChecker::exclusive_guard 
(this=0x55db1338) at ../src/arrow/io/concurrency.h:74
#8  0x7fffb89cb74b in 
arrow::io::internal::RandomAccessFileConcurrencyWrapper::GetSize
 (this=0x55db1320)
at ../src/arrow/io/concurrency.h:200
#9  0x7fffb4ca61f3 in parquet::SerializedRowGroup::GetColumnPageReader 
(this=0x7fff840013e0, i=2) at ../src/parquet/file_reader.cc:117
#10 0x7fffb4ca2b0e in parquet::RowGroupReader::GetColumnPageReader 
(this=0x7fff840014b0, i=2) at ../src/parquet/file_reader.cc:75
#11 0x7fffb4b03296 in parquet::arrow::FileColumnIterator::NextChunk 
(this=0x7fff84000c10) at ../src/parquet/arrow/reader_internal.h:81
#12 0x7fffb4b06ccc in parquet::arrow::LeafReader::NextRowGroup 
(this=0x7fff84000ef0) at ../src/parquet/arrow/reader.cc:452
#13 0x7fffb4b0677e in parquet::arrow::LeafReader::LeafReader 
(this=0x7fff84000ef0, ctx=std::shared_ptr 
(empty) = {...}, 
field=std::shared_ptr (empty) = {...}, 
input=std::unique_ptr = {...}) at 
../src/parquet/arrow/reader.cc:407
#14 0x7fffb4afbdac in parquet::arrow::GetReader (field=..., 
ctx=std::shared_ptr (use count 2, weak count 0) 
= {...}, 
out=0x7fffaeffc580) at ../src/parquet/arrow/reader.cc:709
#15 0x7fffb4b0425a in parquet::arrow::FileReaderImpl::GetFieldReader 
(this=0x55dbf480, i=2, 
included_leaves=std::shared_ptr, 
std::equal_to, std::allocator >> (use count 5, weak count 0) = {...}, 
row_groups=std::vector of length 1, capacity 1 = {...}, out=0x7fffaeffc580) 
at ../src/parquet/arrow/reader.cc:173
#16 0x7fffb4b04451 in parquet::arrow::FileReaderImpl::ReadSchemaField 
(this=0x55dbf480, i=2, 
included_leaves=std::shared_ptr, 
std::equal_to, std::allocator >> (use count 5, weak count 0) = {...}, 
row_groups=std::vector of length 1, capacity 1 = {...}, 
out_field=0x55dce870, out=0x55dce790) at 
../src/parquet/arrow/reader.cc:186
#17 0x7fffb4afcf7f in 
parquet::arrow::FileReaderImploperator()(int) const 
(__closure=0x55dd4e08, i=2) at ../src/parquet/arrow/reader.cc:810
#18 0x7fffb4b01151 in std::__invoke_impl&, const 
std::vector&, std::shared_ptr*)::&, 
int&>(std::__invoke_other, parquet::arrow::FileReaderImpl:: &, int 
&) (__f=..., __args#0=@0x55dd4e38: 2)
at /usr/include/c++/7/bits/invoke.h:60
#19 0x7fffb4b010dc in 
std::__invoke&, const std::vector&, 
std::shared_ptr*)::&, 
int&>(parquet::arrow::FileReaderImpl:: &, int &) (__fn=..., 
__args#0=@0x55dd4e38: 2) at /usr/include/c++/7/bits/invoke.h:96
#20 0x7fffb4b00fe1 in 
std::_Bind&, const std::vector&, 
std::shared_ptr*)::(int)>::__call(std::tuple<> &&, std::_Index_tuple<0>) (this=0x55dd4e08, __args=...) at 
/usr/include/c++/7/functional:469
#21 0x7fffb4b00b0d in 
std::_Bind&, const std::vector&, 
std::shared_ptr*)::(int)>::operator()<>(void) 
(this=0x55dd4e08) at /usr/include/c++/7/functional:551
#22 0x7fffb4b00742 in std::__invoke_impl&, const std::vector&, 
std::shared_ptr*)::(int)>&>(std::__invoke_other, 
std::_Bind&, const std::vector&, 
std::shared_ptr*)::(int)> &) (__f=...) at 
/usr/include/c++/7/bits/invoke.h:60
#23 0x7fffb4b004c5 in 
std::__invoke&, const std::vector&, 
std::shared_ptr*)::(int)>&>(std::_Bind&, const std::vector&, 
std::shared_ptr*)::(int)> &) (__fn=...) at 
/usr/include/c++/7/bits/invoke.h:96
#24 0x7fffb4b001bb in 

[jira] [Created] (ARROW-7815) [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)

2020-02-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7815:
-

 Summary: [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)
 Key: ARROW-7815
 URL: https://issues.apache.org/jira/browse/ARROW-7815
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0


More issues have been discovered with OSS-Fuzz, we need to enhance input 
validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7879) [C++][Doc] Add doc for the Device API

2020-02-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7879:
-

 Summary: [C++][Doc] Add doc for the Device API
 Key: ARROW-7879
 URL: https://issues.apache.org/jira/browse/ARROW-7879
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7531) [C++] Investigate header cost reduction

2020-01-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7531:
-

 Summary: [C++] Investigate header cost reduction
 Key: ARROW-7531
 URL: https://issues.apache.org/jira/browse/ARROW-7531
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou


Using https://github.com/aras-p/ClangBuildAnalyzer we could create to find out 
the worst offenders in terms of header file parsing cost when compiling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7583) [C++][Flight] Auth handler tests fragile on Windows

2020-01-15 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7583:
-

 Summary: [C++][Flight] Auth handler tests fragile on Windows
 Key: ARROW-7583
 URL: https://issues.apache.org/jira/browse/ARROW-7583
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: Antoine Pitrou


This occurs often on AppVeyor:
{code}
[--] 3 tests from TestAuthHandler
[ RUN  ] TestAuthHandler.PassAuthenticatedCalls
[   OK ] TestAuthHandler.PassAuthenticatedCalls (4 ms)
[ RUN  ] TestAuthHandler.FailUnauthenticatedCalls
..\src\arrow\flight\flight_test.cc(1126): error: Value of: status.message()
Expected: has substring "Invalid token"
  Actual: "Could not write record batch to stream: "
[  FAILED  ] TestAuthHandler.FailUnauthenticatedCalls (3 ms)
[ RUN  ] TestAuthHandler.CheckPeerIdentity
[   OK ] TestAuthHandler.CheckPeerIdentity (2 ms)
[--] 3 tests from TestAuthHandler (10 ms total)
[--] 3 tests from TestBasicAuthHandler
[ RUN  ] TestBasicAuthHandler.PassAuthenticatedCalls
[   OK ] TestBasicAuthHandler.PassAuthenticatedCalls (4 ms)
[ RUN  ] TestBasicAuthHandler.FailUnauthenticatedCalls
..\src\arrow\flight\flight_test.cc(1224): error: Value of: status.message()
Expected: has substring "Invalid token"
  Actual: "Could not write record batch to stream: "
[  FAILED  ] TestBasicAuthHandler.FailUnauthenticatedCalls (4 ms)
[ RUN  ] TestBasicAuthHandler.CheckPeerIdentity
[   OK ] TestBasicAuthHandler.CheckPeerIdentity (3 ms)
[--] 3 tests from TestBasicAuthHandler (11 ms total)
{code}

See e.g. 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/30110376/job/vbtd22813g5hlgfl#L2252



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7592) [C++] Fix crashes on corrupt IPC input

2020-01-16 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7592:
-

 Summary: [C++] Fix crashes on corrupt IPC input
 Key: ARROW-7592
 URL: https://issues.apache.org/jira/browse/ARROW-7592
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Fix the following issues spotted by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20117
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20124
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20127

Those are basic missing sanity checks when reading an IPC file.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7576) [C++][Dev] Improve fuzzing setup

2020-01-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7576:
-

 Summary: [C++][Dev] Improve fuzzing setup
 Key: ARROW-7576
 URL: https://issues.apache.org/jira/browse/ARROW-7576
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Developer Tools
Reporter: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7577) [C++][CI] Check fuzzer setup in CI

2020-01-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7577:
-

 Summary: [C++][CI] Check fuzzer setup in CI
 Key: ARROW-7577
 URL: https://issues.apache.org/jira/browse/ARROW-7577
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


Perhaps as a cron job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7648) [C++] Sanitize local paths on Windows

2020-01-22 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7648:
-

 Summary: [C++] Sanitize local paths on Windows
 Key: ARROW-7648
 URL: https://issues.apache.org/jira/browse/ARROW-7648
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


One way or the other, we should try to sanitize local filesystem paths on 
Windows, by converting backslashes into regular slahes.

One place to do it is {{FileSystemFromUri}}. One complication is that 
\-separated paths can fail parsing as a URI, but we only want to sanitize a 
path if we detected it's a local path (by parsing the URI). Perhaps trying on 
error would work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7650) [C++] Dataset tests not built on Windows

2020-01-22 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7650:
-

 Summary: [C++] Dataset tests not built on Windows
 Key: ARROW-7650
 URL: https://issues.apache.org/jira/browse/ARROW-7650
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Dataset
Reporter: Antoine Pitrou


They are explicitly disabled in {{cpp/src/arrow/dataset/CMakeLists.txt}}. Also, 
if re-enable them, there are many compile errors (on VS 2017).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7601) [Doc] [C++] Create a doc page about fuzzing

2020-01-17 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7601:
-

 Summary: [Doc] [C++] Create a doc page about fuzzing
 Key: ARROW-7601
 URL: https://issues.apache.org/jira/browse/ARROW-7601
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Documentation
Reporter: Antoine Pitrou


The doc should probably explain how to reproduce issues locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7621) [Doc] Doc build fails

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7621:
-

 Summary: [Doc] Doc build fails
 Key: ARROW-7621
 URL: https://issues.apache.org/jira/browse/ARROW-7621
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Antoine Pitrou


{code}
Traceback (most recent call last):
  File "/home/antoine/arrow/dev/docs/source/conf.py", line 422, in 
import pyarrow.flight
  File "/home/antoine/arrow/dev/python/pyarrow/flight.py", line 25, in 
from pyarrow._flight import (  # noqa
ModuleNotFoundError: No module named 'pyarrow._flight'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.7/site-packages/sphinx/config.py",
 line 368, in eval_config_file
execfile_(filename, namespace)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.7/site-packages/sphinx/util/pycompat.py",
 line 81, in execfile_
exec(code, _globals)
  File "/home/antoine/arrow/dev/docs/source/conf.py", line 426, in 
pyarrow.flight = sys.modules['pyarrow.flight'] = mock.Mock()
NameError: name 'mock' is not defined
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7622) [Format] Mark Tensor and SparseTensor fields required

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7622:
-

 Summary: [Format] Mark Tensor and SparseTensor fields required
 Key: ARROW-7622
 URL: https://issues.apache.org/jira/browse/ARROW-7622
 Project: Apache Arrow
  Issue Type: Wish
  Components: Format
Affects Versions: 0.15.1
Reporter: Antoine Pitrou
 Fix For: 1.0.0


The Tensor and SparseTensor parts of the format are currently marked 
experimental. This presumably means that they are still allowed to change (and 
indeed they did change one month ago, in ARROW-4225). 

I suggest we take the opportunity to mark some fields required in 
{{Tensor.fbs}} and {{SparseTensor.fbs}}, to make input validation more robust.

cc [~mrkn], [~jacques]  and [~wesm] for opinions.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7623) [C++] Update generated flatbuffers files

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7623:
-

 Summary: [C++] Update generated flatbuffers files
 Key: ARROW-7623
 URL: https://issues.apache.org/jira/browse/ARROW-7623
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 0.16.0


The field added in  ARROW-6836 should be reflected in the generated C++ code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7618) [C++] Fix crashes or undefined behaviour on corrupt IPC input

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7618:
-

 Summary: [C++] Fix crashes or undefined behaviour on corrupt IPC 
input
 Key: ARROW-7618
 URL: https://issues.apache.org/jira/browse/ARROW-7618
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


More input validation issues detected by OSS-Fuzz.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7629) [C++][CI] Add fuzz regression files to arrow-testing

2020-01-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7629:
-

 Summary: [C++][CI] Add fuzz regression files to arrow-testing
 Key: ARROW-7629
 URL: https://issues.apache.org/jira/browse/ARROW-7629
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7630) [C++][CI] Check fuzz crash regressions in CI

2020-01-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7630:
-

 Summary: [C++][CI] Check fuzz crash regressions in CI
 Key: ARROW-7630
 URL: https://issues.apache.org/jira/browse/ARROW-7630
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7632) [C++] [CI] Improve fuzzing seed corpus

2020-01-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7632:
-

 Summary: [C++] [CI] Improve fuzzing seed corpus
 Key: ARROW-7632
 URL: https://issues.apache.org/jira/browse/ARROW-7632
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


The coverage stats produced by OSS-Fuzz instruct us to guide the fuzzing 
process towards the following areas:
- extension arrays
- tensors
- sparse tensors





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7633) [C++][CI] Create fuzz targets for tensors and sparse tensors

2020-01-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7633:
-

 Summary: [C++][CI] Create fuzz targets for tensors and sparse 
tensors
 Key: ARROW-7633
 URL: https://issues.apache.org/jira/browse/ARROW-7633
 Project: Apache Arrow
  Issue Type: Task
Reporter: Antoine Pitrou


These use separate API calls disjoint from RecordBatchFileReader and 
RecordBatchStreamReader, so probably more natural to expose as separate fuzz 
targets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7566) [CI] Use more recent Miniconda on AppVeyor

2020-01-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7566:
-

 Summary: [CI] Use more recent Miniconda on AppVeyor
 Key: ARROW-7566
 URL: https://issues.apache.org/jira/browse/ARROW-7566
 Project: Apache Arrow
  Issue Type: Wish
  Components: Continuous Integration
Reporter: Antoine Pitrou


A newer conda might improve setup speed because of the new package format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7536) [Java] [Dev] `docker-compose pull debian-java` fails

2020-01-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7536:
-

 Summary: [Java] [Dev] `docker-compose pull debian-java` fails
 Key: ARROW-7536
 URL: https://issues.apache.org/jira/browse/ARROW-7536
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools, Java
Reporter: Antoine Pitrou
Assignee: Krisztian Szucs


I get the following error here:
{code}
$ docker-compose pull debian-java
Pulling debian-java ... error

ERROR: for debian-java  manifest for 
apache/arrow-dev:amd64-debian-9-java-8-maven-3.5.4 not found
ERROR: manifest for apache/arrow-dev:amd64-debian-9-java-8-maven-3.5.4 not found
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8023) [Website] Write a blog post about the C data interface

2020-03-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8023:
-

 Summary: [Website] Write a blog post about the C data interface
 Key: ARROW-8023
 URL: https://issues.apache.org/jira/browse/ARROW-8023
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Antoine Pitrou


At some point we should probably write a blog post about the current fuzzing 
setup. Perhaps when we have fixed all reported crashes :-)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8013) [Python][Packaging] Fix manylinux wheels

2020-03-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8013:
-

 Summary: [Python][Packaging] Fix manylinux wheels
 Key: ARROW-8013
 URL: https://issues.apache.org/jira/browse/ARROW-8013
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Reporter: Antoine Pitrou


The manylinux build jobs are failing currently because of ARROW-7917. See for 
example:
https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=7890=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=188




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8036) [C++] Compilation failure with gtest 1.10.0

2020-03-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8036:
-

 Summary: [C++] Compilation failure with gtest 1.10.0
 Key: ARROW-8036
 URL: https://issues.apache.org/jira/browse/ARROW-8036
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


{code}
../src/arrow/array_test.cc:641:1: error: 'TypedTestCaseIsDeprecated' is 
deprecated: TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE 
[-Werror,-Wdeprecated-declarations]
TYPED_TEST_CASE(TestPrimitiveBuilder, Primitives);
^
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8372) [C++] Add Result to table / record batch APIs

2020-04-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8372:
-

 Summary: [C++] Add Result to table / record batch APIs
 Key: ARROW-8372
 URL: https://issues.apache.org/jira/browse/ARROW-8372
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8389) [Integration] Run tests in parallel

2020-04-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8389:
-

 Summary: [Integration] Run tests in parallel
 Key: ARROW-8389
 URL: https://issues.apache.org/jira/browse/ARROW-8389
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Integration
Reporter: Antoine Pitrou


This follows ARROW-8176.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8485) [Integration][Java] Implement extension types integration

2020-04-16 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8485:
-

 Summary: [Integration][Java] Implement extension types integration
 Key: ARROW-8485
 URL: https://issues.apache.org/jira/browse/ARROW-8485
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, Java
Reporter: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8361) [C++] Add Result APIs to Buffer methods and functions

2020-04-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8361:
-

 Summary: [C++] Add Result APIs to Buffer methods and functions
 Key: ARROW-8361
 URL: https://issues.apache.org/jira/browse/ARROW-8361
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8347) [C++] Add Result to Array methods

2020-04-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8347:
-

 Summary: [C++] Add Result to Array methods
 Key: ARROW-8347
 URL: https://issues.apache.org/jira/browse/ARROW-8347
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


Buffers, Array builders (anythings in the parent directory src/arrow root 
directory)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8429) [C++] Fix Buffer::CopySlice on 0-sized buffer

2020-04-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8429:
-

 Summary: [C++] Fix Buffer::CopySlice on 0-sized buffer
 Key: ARROW-8429
 URL: https://issues.apache.org/jira/browse/ARROW-8429
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 0.17.0


Found by OSS-Fuzz.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8441) [C++] Fix crashes on invalid input (OSS-Fuzz)

2020-04-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8441:
-

 Summary: [C++] Fix crashes on invalid input (OSS-Fuzz)
 Key: ARROW-8441
 URL: https://issues.apache.org/jira/browse/ARROW-8441
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 0.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8539) [CI] "AMD64 MacOS 10.15 GLib & Ruby" fails

2020-04-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8539:
-

 Summary: [CI] "AMD64 MacOS 10.15 GLib & Ruby" fails
 Key: ARROW-8539
 URL: https://issues.apache.org/jira/browse/ARROW-8539
 Project: Apache Arrow
  Issue Type: Bug
  Components: C, Continuous Integration, GLib
Reporter: Antoine Pitrou


See e.g.
https://github.com/apache/arrow/pull/6991/checks?check_run_id=604703868

{code}
[192/256] Generating arithmetic_ops.bc
FAILED: src/gandiva/precompiled/arithmetic_ops.bc 
cd 
/Users/runner/runners/2.169.0/work/arrow/arrow/build/cpp/src/gandiva/precompiled
 && /usr/local/Cellar/cmake/3.17.1/bin/cmake -E env 
SDKROOT=/Applications/Xcode_11.3.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk
 /usr/local/opt/llvm@8/bin/clang-8 -std=c++11 -DGANDIVA_IR -DNDEBUG 
-DARROW_STATIC -DGANDIVA_STATIC -fno-use-cxa-atexit -emit-llvm -O3 -c 
/Users/runner/runners/2.169.0/work/arrow/arrow/cpp/src/gandiva/precompiled/arithmetic_ops.cc
 -o 
/Users/runner/runners/2.169.0/work/arrow/arrow/build/cpp/src/gandiva/precompiled/arithmetic_ops.bc
 -isysroot 
/Applications/Xcode_11.3.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
 -I/Users/runner/runners/2.169.0/work/arrow/arrow/cpp/src
dyld: Library not loaded: /usr/local/opt/z3/lib/libz3.dylib
  Referenced from: /usr/local/opt/llvm@8/bin/clang-8
  Reason: image not found
Child aborted
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8540) [C++] Create memory allocation benchmark

2020-04-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8540:
-

 Summary: [C++] Create memory allocation benchmark
 Key: ARROW-8540
 URL: https://issues.apache.org/jira/browse/ARROW-8540
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


To judge of overhead of repeated allocations and deallocations (e.g. for 
temporary computation results).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8529) [C++] Fix usage of NextCounts() in GetBatchWithDict[Spaced]

2020-04-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8529:
-

 Summary: [C++] Fix usage of NextCounts() in 
GetBatchWithDict[Spaced]
 Key: ARROW-8529
 URL: https://issues.apache.org/jira/browse/ARROW-8529
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See discussion in ARROW-8486



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8370) [C++] Add Result to type / schema APIs

2020-04-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8370:
-

 Summary: [C++] Add Result to type / schema APIs
 Key: ARROW-8370
 URL: https://issues.apache.org/jira/browse/ARROW-8370
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


Buffers, Array builders (anythings in the parent directory src/arrow root 
directory)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8233) [CI] Build timeouts on "AMD64 Windows MinGW 64 GLib & Ruby "

2020-03-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8233:
-

 Summary: [CI] Build timeouts on "AMD64 Windows MinGW 64 GLib & 
Ruby "
 Key: ARROW-8233
 URL: https://issues.apache.org/jira/browse/ARROW-8233
 Project: Apache Arrow
  Issue Type: Bug
  Components: C, Continuous Integration, GLib, Ruby
Reporter: Antoine Pitrou


See for example:
https://github.com/apache/arrow/runs/535319644?check_suite_focus=true
https://github.com/apache/arrow/runs/535245619?check_suite_focus=true




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8298) [C++][CI] MinGW builds fail building grpc

2020-03-31 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8298:
-

 Summary: [C++][CI] MinGW builds fail building grpc
 Key: ARROW-8298
 URL: https://issues.apache.org/jira/browse/ARROW-8298
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou
 Fix For: 0.17.0


See e.g. https://github.com/apache/arrow/pull/6781/checks?check_run_id=548816924
{code}
CMake Error at 
D:/a/arrow/arrow/build/cpp/grpc_ep-prefix/src/grpc_ep-stamp/grpc_ep-build-RELEASE.cmake:62
 (message):
  Command failed: 2

   'D:/a/arrow/msys64/usr/bin/make'

  See also


D:/a/arrow/arrow/build/cpp/grpc_ep-prefix/src/grpc_ep-stamp/grpc_ep-build-*.log


make[2]: *** [CMakeFiles/grpc_ep.dir/build.make:130: 
grpc_ep-prefix/src/grpc_ep-stamp/grpc_ep-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:1012: CMakeFiles/grpc_ep.dir/all] Error 2
make: *** [Makefile:158: all] Error 2
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8272) [CI][Python] Test failure on Ubuntu 16.04

2020-03-30 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8272:
-

 Summary: [CI][Python] Test failure on Ubuntu 16.04
 Key: ARROW-8272
 URL: https://issues.apache.org/jira/browse/ARROW-8272
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Antoine Pitrou


See https://github.com/pitrou/arrow/runs/545291564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8553) [C++] Reimplement BitmapAnd using Bitmap::VisitWords

2020-04-22 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8553:
-

 Summary: [C++] Reimplement BitmapAnd using Bitmap::VisitWords
 Key: ARROW-8553
 URL: https://issues.apache.org/jira/browse/ARROW-8553
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.17.0
Reporter: Antoine Pitrou


Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using 
{{Bitmap::VisitWords}} instead would probably yield a manyfold performance 
increase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8570) [CI] [C++] Link failure with AWS SDK on AppVeyor (Windows)

2020-04-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8570:
-

 Summary: [CI] [C++] Link failure with AWS SDK on AppVeyor (Windows)
 Key: ARROW-8570
 URL: https://issues.apache.org/jira/browse/ARROW-8570
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


See e.g. 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/32391335/job/ptbl9h9fffu0s5he
{code}
   Creating library release\arrow_flight.lib and object release\arrow_flight.exp
absl_str_format_internal.lib(float_conversion.cc.obj) : error LNK2019: 
unresolved external symbol __std_reverse_trivially_swappable_1 referenced in 
function "void __cdecl std::_Reverse_unchecked1(char * const,char * 
const,struct std::integral_constant)" 
(??$_Reverse_unchecked1@PEAD@std@@YAXQEAD0U?$integral_constant@_K$00@0@@Z)
absl_strings.lib(charconv_bigint.cc.obj) : error LNK2001: unresolved external 
symbol __std_reverse_trivially_swappable_1
release\arrow_flight.dll : fatal error LNK1120: 1 unresolved externals
{code}

This is probably an issue with a conda-forge package:
https://github.com/conda-forge/grpc-cpp-feedstock/issues/58

In the meantime we could pin {{grpc-cpp}} on your CI configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8567) [Python] pa.array() sometimes ignore "safe=False"

2020-04-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8567:
-

 Summary: [Python] pa.array() sometimes ignore "safe=False"
 Key: ARROW-8567
 URL: https://issues.apache.org/jira/browse/ARROW-8567
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.0
Reporter: Antoine Pitrou


Generally, {{pa.array(data).cast(sometype, safe=...)}} is equivalent to
{{pa.array(data, sometype, safe=...)}}. Consider the following:

{code:python}
>>> pa.array([Decimal('12.34')]).cast(pa.int32(), safe=False)   
>>> 
>>> 

[
  12
]
>>> pa.array([Decimal('12.34')], pa.int32(), safe=False)
>>> 
>>> 

[
  12
]
{code}

However, that is not always the case:
{code:python}
>>> pa.array([Decimal('1234')]).cast(pa.int8(), safe=False) 
>>> 
>>> 

[
  -46
]
>>> pa.array([Decimal('1234')], pa.int8(), safe=False)  
>>> 
>>> 
Traceback (most recent call last):
  ...
ArrowInvalid: Value 1234 too large to fit in C integer type
{code}

I don't think this is very important: first because you can call cast() 
directly, second because the results are unusable anyway.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8568) [C++][Python] Crash on decimal cast in debug mode

2020-04-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8568:
-

 Summary: [C++][Python] Crash on decimal cast in debug mode
 Key: ARROW-8568
 URL: https://issues.apache.org/jira/browse/ARROW-8568
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 0.17.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = pa.array([Decimal('123.45')]) 
>>> 
>>>   
>>> arr 
>>> 
>>>   

[
  123.45
]
>>> arr.type
>>> 
>>>   
Decimal128Type(decimal(5, 2))
>>> arr.cast(pa.decimal128(4, 2))   
>>> 
>>>   
../src/arrow/util/basic_decimal.cc:626:  Check failed: (original_scale) != 
(new_scale) 
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8692) [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8692:
-

 Summary: [C++] Avoid memory copies when downloading from S3
 Key: ARROW-8692
 URL: https://issues.apache.org/jira/browse/ARROW-8692
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8689) [C++] S3 benchmarks fail linking

2020-05-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8689:
-

 Summary: [C++] S3 benchmarks fail linking
 Key: ARROW-8689
 URL: https://issues.apache.org/jira/browse/ARROW-8689
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{code}
FAILED: release/arrow-filesystem-s3fs-benchmark 
: && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  
-fdiagnostics-color=always -fuse-ld=gold -O3 -DNDEBUG  -Wall -mavx2  
-D_GLIBCXX_USE_CXX11_ABI=1 -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer 
-g -O3 -DNDEBUG  -rdynamic 
src/arrow/filesystem/CMakeFiles/arrow-filesystem-s3fs-benchmark.dir/s3fs_benchmark.cc.o
  -o release/arrow-filesystem-s3fs-benchmark  
-Wl,-rpath,/home/antoine/arrow/dev/cpp/build-release/release:/home/antoine/miniconda3/envs/pyarrow/lib
  gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark_main.a  
gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a  
release/libarrow_testing.so.18.0.0  
/home/antoine/miniconda3/envs/pyarrow/lib/libcrypto.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libssl.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libbrotlienc-static.a  
/home/antoine/miniconda3/envs/pyarrow/lib/libbrotlidec-static.a  
/home/antoine/miniconda3/envs/pyarrow/lib/libbrotlicommon-static.a  -ldl  
/home/antoine/miniconda3/envs/pyarrow/lib/libgtest_main.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libgtest.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libgmock.so  -ldl  
release/libparquet.so.18.0.0  release/libarrow.so.18.0.0  
/home/antoine/miniconda3/envs/pyarrow/lib/libssl.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libcrypto.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libbrotlienc-static.a  
/home/antoine/miniconda3/envs/pyarrow/lib/libbrotlidec-static.a  
/home/antoine/miniconda3/envs/pyarrow/lib/libbrotlicommon-static.a  
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-config.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-transfer.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-s3.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-core.so  
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-c-event-stream.so.1.0.0  
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-c-common.so.1.0.0  -lm  
-lpthread  /home/antoine/miniconda3/envs/pyarrow/lib/libaws-checksums.so  -ldl  
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  
mimalloc_ep/src/mimalloc_ep/lib/mimalloc-1.0/libmimalloc-release.a  -pthread  
-lrt  -Wl,-rpath-link,/home/antoine/miniconda3/envs/pyarrow/lib && :
/home/antoine/miniconda3/envs/pyarrow/include/boost/filesystem/path.hpp:792: 
error: undefined reference to 
'boost::filesystem::path::operator/=(boost::filesystem::path const&)'
/home/antoine/miniconda3/envs/pyarrow/include/boost/filesystem/operations.hpp:461:
 error: undefined reference to 
'boost::filesystem::detail::status(boost::filesystem::path const&, 
boost::system::error_code*)'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8453) [Integration][Go] Recursive nested types unsupported

2020-04-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8453:
-

 Summary: [Integration][Go] Recursive nested types unsupported
 Key: ARROW-8453
 URL: https://issues.apache.org/jira/browse/ARROW-8453
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go, Integration
Reporter: Antoine Pitrou


The Go JSON integration implementation doesn't support recursive nested types, 
e.g. "list(list(int32))".

Here is an example traceback when Go is the consumer:
{code}
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/apache/arrow/go/arrow/internal/arrjson.dtypeFromJSON(0xc1687c, 
0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/arrow/go/arrow/internal/arrjson/arrjson.go:238 +0x1710
github.com/apache/arrow/go/arrow/internal/arrjson.dtypeFromJSON(0xc16858, 
0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/arrow/go/arrow/internal/arrjson/arrjson.go:238 +0x838
github.com/apache/arrow/go/arrow/internal/arrjson.fieldFromJSON(0xc16860, 
0xb, 0xc16858, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/arrow/go/arrow/internal/arrjson/arrjson.go:309 +0xb5
github.com/apache/arrow/go/arrow/internal/arrjson.fieldsFromJSON(0xcca280, 
0x4, 0x4, 0x0, 0x6f6d08, 0xc0db60)
/arrow/go/arrow/internal/arrjson/arrjson.go:301 +0xfe
github.com/apache/arrow/go/arrow/internal/arrjson.schemaFromJSON(0xcca280, 
0x4, 0x4, 0xc0db60)
/arrow/go/arrow/internal/arrjson/arrjson.go:274 +0x3f
github.com/apache/arrow/go/arrow/internal/arrjson.NewReader(0x5b4700, 
0xc0e028, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd0)
/arrow/go/arrow/internal/arrjson/reader.go:56 +0x13d
main.validate(0x7ffbc819, 0x37, 0x7ffbc857, 0x26, 0x4acf01, 0x0, 0x0)
/arrow/go/arrow/ipc/cmd/arrow-json-integration-test/main.go:181 +0x1c8
main.runCommand(0x7ffbc857, 0x26, 0x7ffbc819, 0x37, 0x7ffbc884, 
0x8, 0xc16101, 0xc86260, 0x40568f)
/arrow/go/arrow/ipc/cmd/arrow-json-integration-test/main.go:65 +0x228
main.main()
/arrow/go/arrow/ipc/cmd/arrow-json-integration-test/main.go:44 +0x24e
{code}

When Go is the producer:
{code}
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/apache/arrow/go/arrow/internal/arrjson.dtypeFromJSON(0xc1687c, 
0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/arrow/go/arrow/internal/arrjson/arrjson.go:238 +0x1710
github.com/apache/arrow/go/arrow/internal/arrjson.dtypeFromJSON(0xc1686c, 
0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/arrow/go/arrow/internal/arrjson/arrjson.go:238 +0x838
github.com/apache/arrow/go/arrow/internal/arrjson.fieldFromJSON(0xc16860, 
0xb, 0xc1686c, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/arrow/go/arrow/internal/arrjson/arrjson.go:309 +0xb5
github.com/apache/arrow/go/arrow/internal/arrjson.fieldsFromJSON(0xcca280, 
0x4, 0x4, 0x0, 0x6f6d08, 0xc0db60)
/arrow/go/arrow/internal/arrjson/arrjson.go:301 +0xfe
github.com/apache/arrow/go/arrow/internal/arrjson.schemaFromJSON(0xcca280, 
0x4, 0x4, 0xc0db60)
/arrow/go/arrow/internal/arrjson/arrjson.go:274 +0x3f
github.com/apache/arrow/go/arrow/internal/arrjson.NewReader(0x5b4700, 
0xc0e028, 0x0, 0x0, 0x0, 0x0, 0x0, 0xcc37a1760fc5b719)
/arrow/go/arrow/internal/arrjson/reader.go:56 +0x13d
main.cnvToARROW(0x7ffbc814, 0x37, 0x7ffbc852, 0x26, 0x4acf01, 0x0, 0x0)
/arrow/go/arrow/ipc/cmd/arrow-json-integration-test/main.go:137 +0x319
main.runCommand(0x7ffbc852, 0x26, 0x7ffbc814, 0x37, 0x7ffbc87f, 
0xd, 0xc16101, 0xc86260, 0x40568f)
/arrow/go/arrow/ipc/cmd/arrow-json-integration-test/main.go:63 +0x172
main.main()
/arrow/go/arrow/ipc/cmd/arrow-json-integration-test/main.go:44 +0x24e
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8450) [Integration][C++] Implement large list/binary/utf8 integration

2020-04-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8450:
-

 Summary: [Integration][C++] Implement large list/binary/utf8 
integration
 Key: ARROW-8450
 URL: https://issues.apache.org/jira/browse/ARROW-8450
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Integration
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8452) [Go][Integration] Go JSON producer generates incorrect nullable flag for nested types

2020-04-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8452:
-

 Summary: [Go][Integration] Go JSON producer generates incorrect 
nullable flag for nested types
 Key: ARROW-8452
 URL: https://issues.apache.org/jira/browse/ARROW-8452
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go, Integration
Reporter: Antoine Pitrou


It seems that when generating JSON integration data for a nested type, e.g.
"list(int32)", the list's nullable flag is also inherited by child fields. This 
is wrong, because child fields have independent nullable flags, e.g. you may 
have:
* "list(field("ints", int32, nullable=True), nullable=True)"
* "list(field("ints", int32, nullable=False), nullable=True)"
* "list(field("ints", int32, nullable=True), nullable=False)"
* "list(field("ints", int32, nullable=False), nullable=False)"




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8767) [C++] Make ThreadPool task ordering configurable

2020-05-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8767:
-

 Summary: [C++] Make ThreadPool task ordering configurable
 Key: ARROW-8767
 URL: https://issues.apache.org/jira/browse/ARROW-8767
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


We may want to choose a task ordering strategy when constructing a ThreadPool.

To make the ordering strategy configurable, we may want to externalize it in a 
separate JobQueue class.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8763) [C++] Create RandomAccessFile::WillNeed-like API

2020-05-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8763:
-

 Summary: [C++] Create RandomAccessFile::WillNeed-like API
 Key: ARROW-8763
 URL: https://issues.apache.org/jira/browse/ARROW-8763
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


We need to inform RandomAccessFile that we will need a given range or number of 
ranges.
Also call that method from MemoryMappedFile::Read and friends.

Also perhaps write specialized ReadAsync implementations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8764) [C++] Make ThreadPool configurable in ReadRangeCache

2020-05-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8764:
-

 Summary: [C++] Make ThreadPool configurable in ReadRangeCache
 Key: ARROW-8764
 URL: https://issues.apache.org/jira/browse/ARROW-8764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8765) [C++] Design Scheduler API

2020-05-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8765:
-

 Summary: [C++] Design Scheduler API
 Key: ARROW-8765
 URL: https://issues.apache.org/jira/browse/ARROW-8765
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8847) [C++] Pass task size / metrics in Executor API

2020-05-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8847:
-

 Summary: [C++] Pass task size / metrics in Executor API
 Key: ARROW-8847
 URL: https://issues.apache.org/jira/browse/ARROW-8847
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


For now, our ThreadPool implementation would ignore those metrics, but other 
implementations may use it for custom ordering.

Example metrics:
* IO size (number of bytes)
* CPU cost (~ number of instructions)
* Priority (opaque integer? lower is more urgent)





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8848) [CI][C/Glib] MinGW build error

2020-05-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8848:
-

 Summary: [CI][C/Glib] MinGW build error
 Key: ARROW-8848
 URL: https://issues.apache.org/jira/browse/ARROW-8848
 Project: Apache Arrow
  Issue Type: Bug
  Components: C, Continuous Integration, GLib
Reporter: Antoine Pitrou


https://github.com/apache/arrow/pull/7216/checks?check_run_id=686415769#step:6:66
{code}
:: Processing package changes...
upgrading msys2-runtime...
  0 [main] pacman (5056) 
C:\hostedtoolcache\windows\Ruby\2.6.6\x64\msys64\usr\bin\pacman.exe: *** fatal 
error - cygheap base mismatch detected - 0x180330408/0x180317408.
This problem is probably due to using incompatible versions of the cygwin DLL.
Search for cygwin1.dll using the Windows Start->Find/Search facility
and delete all but the most recent version.  The most recent version *should*
reside in x:\cygwin\bin, where 'x' is the drive on which you have
installed the cygwin distribution.  Rebooting is also suggested if you
are unable to find another cygwin DLL.
  0 [main] pacman 246 dofork: child -1 - forked process 5056 died 
unexpectedly, retry 0, exit code 0xC142, errno 11
error: could not open file 
/var/cache/pacman/pkg/msys2-runtime-3.1.4-1-x86_64.pkg.tar.zst: Can't 
initialize filter; unable to run program "zstd -d -qq"
error: could not commit transaction
error: failed to commit transaction (transaction aborted)
Errors occurred, no packages were upgraded.
MSYS2 could not be found. Please run 'ridk install'
or download and install MSYS2 manually from https://msys2.github.io/
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8846) [Dev][Python] Autoformat Python sources with Archery

2020-05-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8846:
-

 Summary: [Dev][Python] Autoformat Python sources with Archery
 Key: ARROW-8846
 URL: https://issues.apache.org/jira/browse/ARROW-8846
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery, Developer Tools, Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8781) [CI][C++] Enable ccache on GHA MinGW jobs

2020-05-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8781:
-

 Summary: [CI][C++] Enable ccache on GHA MinGW jobs
 Key: ARROW-8781
 URL: https://issues.apache.org/jira/browse/ARROW-8781
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


It would be nice to enable caching with ccache on the MinGW Github Actions 
jobs. They're currently quite slow...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8798) [C++] Fix Parquet crashes on invalid input (OSS-Fuzz)

2020-05-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8798:
-

 Summary: [C++] Fix Parquet crashes on invalid input (OSS-Fuzz)
 Key: ARROW-8798
 URL: https://issues.apache.org/jira/browse/ARROW-8798
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8145) [C++] Rename GetTargetInfos

2020-03-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8145:
-

 Summary: [C++] Rename GetTargetInfos
 Key: ARROW-8145
 URL: https://issues.apache.org/jira/browse/ARROW-8145
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Python
Reporter: Antoine Pitrou


Sorry, but I think I'm irked by the new "GetTargetInfos" spelling.
I suggest either "GetTargetInfo" or "GetFileInfo" (both singular).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8146) [C++] Add per-filesystem facility to sanitize a path

2020-03-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8146:
-

 Summary: [C++] Add per-filesystem facility to sanitize a path
 Key: ARROW-8146
 URL: https://issues.apache.org/jira/browse/ARROW-8146
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8194) [CI] Github Actions Windows job should run tests in parallel

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8194:
-

 Summary: [CI] Github Actions Windows job should run tests in 
parallel
 Key: ARROW-8194
 URL: https://issues.apache.org/jira/browse/ARROW-8194
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


Currently, the GHA Windows job runs tests using {{-j 1}}. But IIRC GHA exposes 
two CPU cores.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8195) [CI] Remove Boost download step in Github Actions

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8195:
-

 Summary: [CI] Remove Boost download step in Github Actions
 Key: ARROW-8195
 URL: https://issues.apache.org/jira/browse/ARROW-8195
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


According to https://github.com/actions/virtual-environments/issues/370, the 
full version of Boost should now be properly installed on the GHA Windows 2019 
image. We should try to remove the Boost download step, which is quite slow (it 
installs 2GB worth of files).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8198) [C++] Diffing should handle null arrays

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8198:
-

 Summary: [C++] Diffing should handle null arrays
 Key: ARROW-8198
 URL: https://issues.apache.org/jira/browse/ARROW-8198
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


When {{AssertArraysEqual}} fails between arrays of type null, you get the 
rather unhelpful message:
{code}
../src/arrow/compare.cc:964:  Check failed: _s.ok() Operation failed: 
PrintDiff(left, right, opts.diff_sink())
Bad status: NotImplemented: formatting diffs between arrays of type null
In ../src/arrow/array/diff.cc, line 453, code: VisitTypeInline(type, this)
In ../src/arrow/array/diff.cc, line 825, code: (_error_or_value10).status()
In ../src/arrow/compare.cc, line 955, code: (_error_or_value4).status()
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8704) [C++] Fix Parquet crash on invalid input (OSS-Fuzz)

2020-05-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8704:
-

 Summary: [C++] Fix Parquet crash on invalid input (OSS-Fuzz)
 Key: ARROW-8704
 URL: https://issues.apache.org/jira/browse/ARROW-8704
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.17.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8732) [C++] Let Futures support cancellation

2020-05-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8732:
-

 Summary: [C++] Let Futures support cancellation
 Key: ARROW-8732
 URL: https://issues.apache.org/jira/browse/ARROW-8732
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


There should be a way for consumers of Futures to notify that they are not 
interested in the task at hand anymore. For some kinds of tasks this may allow 
cancelling the task in-flight (e.g. an IO task, or a task consisting of 
multiple steps).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8716) CLONE - [Integration][Java] Fix map type to allow non-standard field names

2020-05-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8716:
-

 Summary: CLONE - [Integration][Java] Fix map type to allow 
non-standard field names
 Key: ARROW-8716
 URL: https://issues.apache.org/jira/browse/ARROW-8716
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, Java
Reporter: Antoine Pitrou
 Fix For: 1.0.0


Java should support the integration test added in ARROW-7173.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8715) [Integration][Java] Implement extension types integration

2020-05-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8715:
-

 Summary: [Integration][Java] Implement extension types integration
 Key: ARROW-8715
 URL: https://issues.apache.org/jira/browse/ARROW-8715
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, Java
Reporter: Antoine Pitrou
 Fix For: 1.0.0


Java should support the extension type integration tests added in ARROW-5649.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8871) [C++] Gandiva build failure

2020-05-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8871:
-

 Summary: [C++] Gandiva build failure
 Key: ARROW-8871
 URL: https://issues.apache.org/jira/browse/ARROW-8871
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Gandiva
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Looks like there was an undetected conflict between Gandiva changes to use the 
Arrow parsing internals, and the refactor of said parsing internals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8872) [CI] Travis-CI jobs fail on github fork (can't open file 'ci/detect-changes.py')

2020-05-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8872:
-

 Summary: [CI] Travis-CI jobs fail on github fork (can't open file 
'ci/detect-changes.py')
 Key: ARROW-8872
 URL: https://issues.apache.org/jira/browse/ARROW-8872
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Antoine Pitrou


See example here:
https://travis-ci.org/github/pitrou/arrow/builds/689168956

Excerpt:
{code}
$ eval "$(python ci/detect-changes.py)"

python: can't open file 'ci/detect-changes.py': [Errno 2] No such file or 
directory
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9013) [C++] Validate enum-style CMake options

2020-06-02 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9013:
-

 Summary: [C++] Validate enum-style CMake options
 Key: ARROW-9013
 URL: https://issues.apache.org/jira/browse/ARROW-9013
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Developer Tools
Reporter: Antoine Pitrou


It seems that some CMake options silently allow invalid values, such as 
{{-DARROW_SIMD_LEVEL=foobar}}. We should validate inputs to avoid typos (such 
as "SSE42" instead of "SSE4_2").



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9090) [C++] Bump versions of bundled libraries

2020-06-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9090:
-

 Summary: [C++] Bump versions of bundled libraries
 Key: ARROW-9090
 URL: https://issues.apache.org/jira/browse/ARROW-9090
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 1.0.0


We should bump the versions of bundled dependencies, wherever possible, to 
ensure that users get bugfixes and improvements made in those third-party 
libraries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9101) [Doc][C++][Python] Document encoding expected by CSV and JSON readers

2020-06-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9101:
-

 Summary: [Doc][C++][Python] Document encoding expected by CSV and 
JSON readers
 Key: ARROW-9101
 URL: https://issues.apache.org/jira/browse/ARROW-9101
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Documentation, Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9106) [C++] Add C++ foundation to ease file transcoding

2020-06-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9106:
-

 Summary: [C++] Add C++ foundation to ease file transcoding
 Key: ARROW-9106
 URL: https://issues.apache.org/jira/browse/ARROW-9106
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


In some situations (e.g. reading a Windows-produced CSV file), the user might 
transcode data before ingesting it into Arrow. Rather than build transcoding in 
C++ (which would require a library of encodings), we could delegate it to 
bindings as needed, by providing a generic InputStream facility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9109) [Python][Packaging] Enable S3 support in manylinux wheels

2020-06-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9109:
-

 Summary: [Python][Packaging] Enable S3 support in manylinux wheels
 Key: ARROW-9109
 URL: https://issues.apache.org/jira/browse/ARROW-9109
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Packaging, Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9094) [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9094:
-

 Summary: [Python] Bump versions of compiled dependencies in 
manylinux wheels
 Key: ARROW-9094
 URL: https://issues.apache.org/jira/browse/ARROW-9094
 Project: Apache Arrow
  Issue Type: Task
  Components: Packaging, Python
Reporter: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   >