[jira] [Created] (ARROW-9093) [FlightRPC][C++][Python] Allow setting gRPC client options

2020-06-10 Thread David Li (Jira)
David Li created ARROW-9093:
---

 Summary: [FlightRPC][C++][Python] Allow setting gRPC client options
 Key: ARROW-9093
 URL: https://issues.apache.org/jira/browse/ARROW-9093
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC, Python
Reporter: David Li
Assignee: David Li


There's no way to set generic gRPC options which are useful for tuning behavior 
(e.g. round-robin load balancing). Rather than bind all of these one by one, 
gRPC allows setting arguments as generic string-string or string-integer pairs; 
we could expose this (and leave the interpretation implementation-dependent).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8975) [FlightRPC][C++] Fix flaky MacOS tests

2020-05-28 Thread David Li (Jira)
David Li created ARROW-8975:
---

 Summary: [FlightRPC][C++] Fix flaky MacOS tests
 Key: ARROW-8975
 URL: https://issues.apache.org/jira/browse/ARROW-8975
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 0.17.1
Reporter: David Li
Assignee: David Li


The gRPC MacOS tests have been flaking again.

Looking at [https://github.com/grpc/grpc/issues/20311] they may possibly have 
been fixed except [https://github.com/grpc/grpc/issues/13856] reports they 
haven't (in some configurations?) so I will try a few things in CI, or just 
disable the tests on MacOS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8958) [FlightRPC][Python] Implement Flight DoExchange for Python

2020-05-26 Thread David Li (Jira)
David Li created ARROW-8958:
---

 Summary: [FlightRPC][Python] Implement Flight DoExchange for Python
 Key: ARROW-8958
 URL: https://issues.apache.org/jira/browse/ARROW-8958
 Project: Apache Arrow
  Issue Type: New Feature
  Components: FlightRPC, Python
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8957) [FlightRPC][C++] Fail to build due to IpcOptions

2020-05-26 Thread David Li (Jira)
David Li created ARROW-8957:
---

 Summary: [FlightRPC][C++] Fail to build due to IpcOptions
 Key: ARROW-8957
 URL: https://issues.apache.org/jira/browse/ARROW-8957
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None

2020-05-22 Thread David Li (Jira)
David Li created ARROW-8889:
---

 Summary: [Python] Python 3.7 SIGSEGV when comparing RecordBatch to 
None
 Key: ARROW-8889
 URL: https://issues.apache.org/jira/browse/ARROW-8889
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
Reporter: David Li


This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. 
It seems to happen even when built from source, but I used the wheels for this 
reproduction.
{noformat}
> uname -a
Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + 
x86_64 GNU/Linux
> python --version
Python 3.7.7
> pip freeze
numpy==1.18.4
pyarrow==0.17.1{noformat}
Reproduction:
{code:python}
import pyarrow as pa
table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"])
batches = table.to_batches()
batches[0].equals(None)
{code}
{noformat}
#0  0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from 
/home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17
#1  0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch const&, 
bool) const () from 
/home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17
#2  0x7fffe084a6e0 in 
__pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) () 
from 
/home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so
#3  0x556b97e4 in _PyMethodDef_RawFastCallKeywords 
(method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, 
self=0x7fffdefd7110, args=0x7786f5c8, nargs=, 
kwnames=)
at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694
#4  0x556c06af in _PyMethodDescr_FastCallKeywords 
(descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at 
/tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288
#5  0x55724add in call_function (kwnames=0x0, oparg=2, 
pp_stack=) at 
/tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593
#6  _PyEval_EvalFrameDefault (f=, throwflag=) at 
/tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110
#7  0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, 
globals=, locals=, args=, 
argcount=, kwnames=0x0, kwargs=0x0, kwcount=, 
kwstep=2, 
defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at 
/tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930
#8  0x5566a1c4 in PyEval_EvalCodeEx (_co=, 
globals=, locals=, args=, 
argcount=, kws=, kwcount=0, defs=0x0, defcount=0, 
kwdefs=0x0, 
closure=0x0) at 
/tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959
#9  0x5566a1ec in PyEval_EvalCode (co=, 
globals=, locals=) at 
/tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524
#10 0x55780cb4 in run_mod (mod=, filename=, globals=0x778d7c30, locals=0x778d7c30, flags=, 
arena=)
at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035
#11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, 
filename_str=, start=, globals=0x778d7c30, 
locals=0x778d7c30, closeit=1, flags=0x7fffe1b0)
at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988
#12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, 
filename=, closeit=1, flags=0x7fffe1b0) at 
/tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429
#13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, 
filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at 
/tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462
#14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at 
/tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641
#15 pymain_run_python (pymain=0x7fffe2c0) at 
/tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902
#16 pymain_main (pymain=0x7fffe2c0) at 
/tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442
#17 0x5578c51c in _Py_UnixMain (argc=, argv=) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477
#18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6
#19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8879) [FlightRPC][Java] FlightStream should unwrap ExecutionExceptions

2020-05-21 Thread David Li (Jira)
David Li created ARROW-8879:
---

 Summary: [FlightRPC][Java] FlightStream should unwrap 
ExecutionExceptions
 Key: ARROW-8879
 URL: https://issues.apache.org/jira/browse/ARROW-8879
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Affects Versions: 0.17.1
Reporter: David Li
Assignee: David Li


Currently FlightStream bubbles a lot of exceptions as RuntimeException or 
ExecutionException, or just wraps them with CallStatus.INTERNAL. For 
RuntimeException, we should always check if it's a gRPC StatusRuntimeException 
and convert to the equivalent Flight exception; for ExecutionException, we 
should check if the _cause_ is a gRPC exception and convert.

Example: on master, FlightStream#getDescriptor reports all errors as 
CallStatus.INTERNAL, but we should inspect ExecutionException#getCause instead.

This is needed so that errors get properly reported, e.g. if a service sends a 
PERMISSION_DENIED error, the client should get that and not a RuntimeException, 
ExecutionException, or INTERNAL error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8858) [FlightRPC] Ensure headers are uniformly exposed

2020-05-19 Thread David Li (Jira)
David Li created ARROW-8858:
---

 Summary: [FlightRPC] Ensure headers are uniformly exposed
 Key: ARROW-8858
 URL: https://issues.apache.org/jira/browse/ARROW-8858
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java, Python
Affects Versions: 0.17.0
Reporter: David Li
Assignee: David Li


* Java: MetadataAdapter should support iterating through binary headers
* Python: binary headers need to be present in the output



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8776) [FlightRPC][C++] Flight/C++ middleware don't receive headers on failed calls to Java servers

2020-05-12 Thread David Li (Jira)
David Li created ARROW-8776:
---

 Summary: [FlightRPC][C++] Flight/C++ middleware don't receive 
headers on failed calls to Java servers
 Key: ARROW-8776
 URL: https://issues.apache.org/jira/browse/ARROW-8776
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 0.17.0
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0


For a failed call, gRPC/Java may consolidate headers with trailers, so 
Flight/C++ needs to check both headers and trailers to get any headers that may 
have been sent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8775) [C++][FlightRPC] Integration client doesn't run integration tests

2020-05-12 Thread David Li (Jira)
David Li created ARROW-8775:
---

 Summary: [C++][FlightRPC] Integration client doesn't run 
integration tests
 Key: ARROW-8775
 URL: https://issues.apache.org/jira/browse/ARROW-8775
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC, Integration
Reporter: David Li
Assignee: David Li


Looks like I rebased badly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8749) [C++] IpcFormatWriter writes dictionary batches with wrong ID

2020-05-09 Thread David Li (Jira)
David Li created ARROW-8749:
---

 Summary: [C++] IpcFormatWriter writes dictionary batches with 
wrong ID
 Key: ARROW-8749
 URL: https://issues.apache.org/jira/browse/ARROW-8749
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.17.0
Reporter: David Li


IpcFormatWriter assigns dictionary IDs once when it writes the schema message. 
Then, when it writes dictionary batches, it assigns dictionary IDs again 
because it re-collects dictionaries from the given batch. So for example, if 
you have 5 dictionaries, the first dictionary will end up with ID 0 but be 
written with ID 5.

For example, this will fail with "'_error_or_value11.status()' failed with Key 
error: No record of dictionary type with id 9"
{code:cpp}
TEST_F(TestMetadata, DoPutDictionaries) {
  ASSERT_OK_AND_ASSIGN(auto sink, arrow::io::BufferOutputStream::Create());
  std::shared_ptr schema = ExampleDictSchema();
  BatchVector expected_batches;
  ASSERT_OK(ExampleDictBatches(_batches));
  ASSERT_OK_AND_ASSIGN(auto writer, arrow::ipc::NewStreamWriter(sink.get(), 
schema));
  for (auto& batch : expected_batches) {
ASSERT_OK(writer->WriteRecordBatch(*batch));
  }
  ASSERT_OK_AND_ASSIGN(auto buf, sink->Finish());
  arrow::io::BufferReader source(buf);
  ASSERT_OK_AND_ASSIGN(auto reader, 
arrow::ipc::RecordBatchStreamReader::Open());
  AssertSchemaEqual(schema, reader->schema());
  for (auto& batch : expected_batches) {
ASSERT_OK_AND_ASSIGN(auto actual, reader->Next());
AssertBatchesEqual(*actual, *batch);
  }
}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8691) [C++] Don't re-initialize Minio in every s3fs benchmark

2020-05-04 Thread David Li (Jira)
David Li created ARROW-8691:
---

 Summary: [C++] Don't re-initialize Minio in every s3fs benchmark
 Key: ARROW-8691
 URL: https://issues.apache.org/jira/browse/ARROW-8691
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


Minio is initialized in a googlebenchmark fixture, which means that every 
benchmark will start a new server and re-generate all the data. We should 
better control this so that we can at least share data within a benchmark run, 
since it's the same files between all benchmarks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8666) [Java] DenseUnionVector has no way to set offset/validity directly

2020-05-01 Thread David Li (Jira)
David Li created ARROW-8666:
---

 Summary: [Java] DenseUnionVector has no way to set offset/validity 
directly
 Key: ARROW-8666
 URL: https://issues.apache.org/jira/browse/ARROW-8666
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 0.17.0
Reporter: David Li


You can set the type ID manually, but you cannot set the offset or validity 
directly. Ideally, we'd have an API like Python that lets us build it directly 
from constituent vectors and the offsets/type IDs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8665) [Java] DenseUnionWriter#setPosition fails with NullPointerException

2020-05-01 Thread David Li (Jira)
David Li created ARROW-8665:
---

 Summary: [Java] DenseUnionWriter#setPosition fails with 
NullPointerException
 Key: ARROW-8665
 URL: https://issues.apache.org/jira/browse/ARROW-8665
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 0.17.0
Reporter: David Li


The writer always iterates through all BaseWriters, and an array of 128 
BaseWriters is allocated. So if you do not have 128 typeIds and do not touch 
all of them, setPosition will give you an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8572) [Python] Expose UnionArray.array and other fields

2020-04-23 Thread David Li (Jira)
David Li created ARROW-8572:
---

 Summary: [Python] Expose UnionArray.array and other fields
 Key: ARROW-8572
 URL: https://issues.apache.org/jira/browse/ARROW-8572
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.17.0
Reporter: David Li
Assignee: David Li


Currently in Python, you can construct a UnionArray easily, but getting the 
data back out (without copying) is near-impossible. We should expose the getter 
for UnionArray.array so we can pull out the constituent arrays. We should also 
expose fields like mode while we're at it.

The use case is: in Flight, we'd like to write multiple distinct datasets (with 
distinct schemas) in a single logical call; using UnionArrays lets us combine 
these datasets into a single logical dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8555) [FlightRPC][Java] Implement Flight DoExchange for Java

2020-04-22 Thread David Li (Jira)
David Li created ARROW-8555:
---

 Summary: [FlightRPC][Java] Implement Flight DoExchange for Java
 Key: ARROW-8555
 URL: https://issues.apache.org/jira/browse/ARROW-8555
 Project: Apache Arrow
  Issue Type: New Feature
  Components: FlightRPC
Reporter: David Li
Assignee: David Li


As described in the mailing list vote.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8487) [FlightRPC][C++] Make it possible to target a specific payload size

2020-04-16 Thread David Li (Jira)
David Li created ARROW-8487:
---

 Summary: [FlightRPC][C++] Make it possible to target a specific 
payload size
 Key: ARROW-8487
 URL: https://issues.apache.org/jira/browse/ARROW-8487
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


gRPC by default limits message sizes on the wire. While Flight in turn disables 
these by default, they're still useful to be able to control memory 
consumption. A well-behaved client/server may wish to split up writes to 
respect these limits. However, right now, there's no way to measure the memory 
usage of what you're about to write without serializing it.

With ARROW-5377, we can in theory avoid this by having the writer take control 
of serialization, producing the IpcPayload, then measuring the size and writing 
the payload if the size is as desired. However, Flight doesn't provide such a 
low-level mechanism yet - we'd need to open that up as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8327) [FlightRPC][Java] gRPC trailers may be null

2020-04-03 Thread David Li (Jira)
David Li created ARROW-8327:
---

 Summary: [FlightRPC][Java] gRPC trailers may be null
 Key: ARROW-8327
 URL: https://issues.apache.org/jira/browse/ARROW-8327
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li
Assignee: David Li


This can cause spurious failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8297) [FlightRPC][C++] Implement Flight DoExchange for C++

2020-03-31 Thread David Li (Jira)
David Li created ARROW-8297:
---

 Summary: [FlightRPC][C++] Implement Flight DoExchange for C++
 Key: ARROW-8297
 URL: https://issues.apache.org/jira/browse/ARROW-8297
 Project: Apache Arrow
  Issue Type: New Feature
  Components: FlightRPC
Reporter: David Li
Assignee: David Li


As described in the mailing list vote.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8176) [FlightRPC][Integration] Have Flight services bind to port 0 in integration

2020-03-20 Thread David Li (Jira)
David Li created ARROW-8176:
---

 Summary: [FlightRPC][Integration] Have Flight services bind to 
port 0 in integration
 Key: ARROW-8176
 URL: https://issues.apache.org/jira/browse/ARROW-8176
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Integration
Reporter: David Li


In integration tests, instead of allocating a port and then trying to bind to 
it, we should have the Flight server bind to port 0, then have the test runner 
parse out the port. This avoids flakiness due to port collisions. This also 
will give us the ability to know when the Flight server has actually started.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8160) [FlightRPC][C++] DoPutPayloadWriter doesn't always expose server error message

2020-03-19 Thread David Li (Jira)
David Li created ARROW-8160:
---

 Summary: [FlightRPC][C++] DoPutPayloadWriter doesn't always expose 
server error message
 Key: ARROW-8160
 URL: https://issues.apache.org/jira/browse/ARROW-8160
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 0.16.0
Reporter: David Li


{noformat}
C:/projects/arrow/cpp/src/arrow/flight/flight_test.cc(1261): error: Value of: 
status.message()
Expected: has substring "Invalid token"
  Actual: "Could not write record batch to stream: "
[  FAILED  ] TestBasicAuthHandler.FailUnauthenticatedCalls (17 ms)
{noformat}

This happens because {{Close()}} calls {{RecordBatchPayloadWriter::Close()}}, 
which calls {{CheckStarted}}, which in turn tries to write data. If the data 
gets flushed and the server responds in time, we'll see a failure during 
writing, causing us to never check the server status (which is the last part of 
{{DoPutPayloadWriter::Close}}). We need to reliably check and expose the gRPC 
status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8152) [C++] IO: split large coalesced reads into smaller ones

2020-03-18 Thread David Li (Jira)
David Li created ARROW-8152:
---

 Summary: [C++] IO: split large coalesced reads into smaller ones
 Key: ARROW-8152
 URL: https://issues.apache.org/jira/browse/ARROW-8152
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


We have a facility to coalesce small reads, but remote filesystems may also 
benefit from splitting large reads to take advantage of concurrency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8151) [Benchmarking][Dataset] Benchmark Parquet read performance with S3File

2020-03-18 Thread David Li (Jira)
David Li created ARROW-8151:
---

 Summary: [Benchmarking][Dataset] Benchmark Parquet read 
performance with S3File
 Key: ARROW-8151
 URL: https://issues.apache.org/jira/browse/ARROW-8151
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, C++ - Dataset
Reporter: David Li
Assignee: David Li


We should establish a performance baseline with the current S3File 
implementation and Parquet reader before proceeding with work like PARQUET-1698.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8112) [FlightRPC][C++] Some status codes don't round-trip through gRPC

2020-03-13 Thread David Li (Jira)
David Li created ARROW-8112:
---

 Summary: [FlightRPC][C++] Some status codes don't round-trip 
through gRPC
 Key: ARROW-8112
 URL: https://issues.apache.org/jira/browse/ARROW-8112
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 0.16.0
Reporter: David Li
Assignee: David Li


KeyError and AlreadyExists don't fully round-trip, instead becoming UNKNOWN. 
There are others, but we don't attempt to map all the Arrow status to a gRPC 
status, only the ones that closely correspond to a gRPC error. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8101) [FlightRPC][Java] Can't read/write only an empty null array

2020-03-12 Thread David Li (Jira)
David Li created ARROW-8101:
---

 Summary: [FlightRPC][Java] Can't read/write only an empty null 
array
 Key: ARROW-8101
 URL: https://issues.apache.org/jira/browse/ARROW-8101
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.16.0
Reporter: David Li
Assignee: David Li


This is rather an edge case, but Java/Flight fails with a table consisting of 
only an empty null array, since it has no buffers, and Java assumes this can 
never happen.

{noformat}
Exception in thread "main" org.apache.arrow.flight.FlightRuntimeException: 
CallStatus{code=CANCELLED, cause=java.lang.RuntimeException: Unexpected IO 
Exception, description='Failed to stream message'}
at 
org.apache.arrow.flight.CallStatus.toRuntimeException(CallStatus.java:113)
at 
org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:134)
at 
org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:142)
at 
org.apache.arrow.flight.FlightClient$SetStreamObserver.onError(FlightClient.java:315)
at 
io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:442)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
org.apache.arrow.flight.grpc.ClientInterceptorAdapter$FlightClientCallListener.onClose(ClientInterceptorAdapter.java:117)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700)
at 
io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399)
at 
io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:510)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:66)
at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:630)
at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:518)
at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:692)
at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:681)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Unexpected IO Exception
at 
org.apache.arrow.flight.ArrowMessage.asInputStream(ArrowMessage.java:334)
at org.apache.arrow.flight.ArrowMessage.access$000(ArrowMessage.java:64)
at 
org.apache.arrow.flight.ArrowMessage$ArrowMessageHolderMarshaller.stream(ArrowMessage.java:382)
at 
org.apache.arrow.flight.ArrowMessage$ArrowMessageHolderMarshaller.stream(ArrowMessage.java:372)
at io.grpc.MethodDescriptor.streamRequest(MethodDescriptor.java:290)
at 
io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473)
at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457)
at 
io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
at 
io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
at 
io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
at 
io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:341)
at 
org.apache.arrow.flight.FlightClient$PutObserver.putNext(FlightClient.java:354)
at 
org.apache.arrow.flight.example.integration.IntegrationTestClient.testStream(IntegrationTestClient.java:132)
at 

[jira] [Created] (ARROW-7734) [C++] Segfault when comparing status with and without detail

2020-01-31 Thread David Li (Jira)
David Li created ARROW-7734:
---

 Summary: [C++] Segfault when comparing status with and without 
detail
 Key: ARROW-7734
 URL: https://issues.apache.org/jira/browse/ARROW-7734
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.16.0
Reporter: David Li
Assignee: David Li


I noticed this while working on Flight integration tests. The equality operator 
for Status doesn't check whether the status detail is nullptr before 
dereferencing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7721) [FlightRPC][Java] Flaky Flight auth test

2020-01-29 Thread David Li (Jira)
David Li created ARROW-7721:
---

 Summary: [FlightRPC][Java] Flaky Flight auth test
 Key: ARROW-7721
 URL: https://issues.apache.org/jira/browse/ARROW-7721
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.16.0
Reporter: David Li


See 
https://github.com/apache/arrow/commit/8b42288f58caa84a40bb7a13c1731ff919c934f2/checks?check_suite_id=426509031

{noformat}
[ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.171 s 
<<< FAILURE! - in org.apache.arrow.flight.auth.TestBasicAuth
[ERROR] asyncCall  Time elapsed: 0.068 s  <<< ERROR!
java.lang.IllegalStateException: 
Memory was leaked by query. Memory leaked: (65536)
Allocator(ROOT) 0/65536/131584/9223372036854775807 (res/actual/peak/limit)

at 
org.apache.arrow.flight.auth.TestBasicAuth.shutdown(TestBasicAuth.java:152)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7663) from_pandas gives TypeError instead of ArrowTypeError in some cases

2020-01-23 Thread David Li (Jira)
David Li created ARROW-7663:
---

 Summary: from_pandas gives TypeError instead of ArrowTypeError in 
some cases
 Key: ARROW-7663
 URL: https://issues.apache.org/jira/browse/ARROW-7663
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1
Reporter: David Li


from_pandas sometimes raises a TypeError with an uninformative error message 
rather than an ArrowTypeError with the full, informative type error for 
mixed-type array columns:

{noformat}
>>> pa.Table.from_pandas(pd.DataFrame({"a": ['a', 1]}))
Traceback (most recent call last):
  File "", line 1, in 
  File "pyarrow/table.pxi", line 1177, in pyarrow.lib.Table.from_pandas
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 575, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 575, in 
for c, f in zip(columns_to_convert, convert_fields)]
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 566, in convert_column
raise e
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 560, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ("Expected a bytes object, got a 'int' object", 
'Conversion failed for column a with type object')
>>> pa.Table.from_pandas(pd.DataFrame({"a": [1, 'a']}))
Traceback (most recent call last):
  File "", line 1, in 
  File "pyarrow/table.pxi", line 1177, in pyarrow.lib.Table.from_pandas
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 575, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 575, in 
for c, f in zip(columns_to_convert, convert_fields)]
  File 
"/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py",
 line 560, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array
TypeError: an integer is required (got type str)
{noformat}

Noticed on 0.15.1 and on master when we tried to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7579) [FlightRPC] Make Handshake optional

2020-01-14 Thread David Li (Jira)
David Li created ARROW-7579:
---

 Summary: [FlightRPC] Make Handshake optional
 Key: ARROW-7579
 URL: https://issues.apache.org/jira/browse/ARROW-7579
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC
Reporter: David Li
 Fix For: 1.0.0


We should make it possible to _not_ invoke Handshake for services that don't 
want it. Especially when using it with flight-grpc, where the standard gRPC 
authentication mechanisms don't know about Flight and try to authenticate the 
Handshake endpoint - it's easy to forget to configure this endpoint to bypass 
authentication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7538) Clarify actual and desired size in AllocationManager

2020-01-09 Thread David Li (Jira)
David Li created ARROW-7538:
---

 Summary: Clarify actual and desired size in AllocationManager
 Key: ARROW-7538
 URL: https://issues.apache.org/jira/browse/ARROW-7538
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


As a follow up to the review of ARROW-7329, we should clarify the different 
sizes (desired vs actual size) in AllocationManager: 
https://github.com/apache/arrow/pull/5973#discussion_r354729754



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7477) [FlightRPC][Java] Flight gRPC service is missing reflection info

2019-12-30 Thread David Li (Jira)
David Li created ARROW-7477:
---

 Summary: [FlightRPC][Java] Flight gRPC service is missing 
reflection info
 Key: ARROW-7477
 URL: https://issues.apache.org/jira/browse/ARROW-7477
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Affects Versions: 0.14.1
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0


When setting up the gRPC service, we mangle the gRPC [service 
descriptor|https://github.com/apache/arrow/blob/master/java/flight/src/main/java/org/apache/arrow/flight/FlightBindingService.java],
 removing reflection information. This means things like gRPC reflection don't 
work, which is necessary for debugging/development tools like 
[grpcurl|https://github.com/fullstorydev/grpcurl/]. Reflection information is 
also useful to do things like authorization/access control based on RPC method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7343) Memory leak in Flight ArrowMessage

2019-12-06 Thread David Li (Jira)
David Li created ARROW-7343:
---

 Summary: Memory leak in Flight ArrowMessage
 Key: ARROW-7343
 URL: https://issues.apache.org/jira/browse/ARROW-7343
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.15.1
Reporter: David Li
Assignee: David Li


I believe this causes things like ARROW-4765.

If a stream is interrupted or otherwise not drained on the server-side, the 
serialized form of the ArrowMessage (DrainableByteBufInputStream) will sit 
around forever, leaking memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7254) BaseVariableWidthVector#setSafe appears to make value offsets inconsistent

2019-11-25 Thread David Li (Jira)
David Li created ARROW-7254:
---

 Summary: BaseVariableWidthVector#setSafe appears to make value 
offsets inconsistent
 Key: ARROW-7254
 URL: https://issues.apache.org/jira/browse/ARROW-7254
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 0.15.1
Reporter: David Li


The following program writes a file which PyArrow either segfaults (0.14.1) or 
rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: Offset 
invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} on 
reading.

Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index fixes 
it. While it seems from the new documentation that we should (must?) call 
{{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to get an 
invalid file by calling using {{setSafe}}, either. 

Full traceback:
{noformat}
> python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", 
> "rb")).read_pandas())'
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py",
 line 46, in read_pandas
table = self.read_all()
  File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
  File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
  File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent 
value_offsets for null slot0!=4
{noformat}
 
Full program:
{code:java}
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Collections;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowStreamWriter;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;

public class AsdfTest {

  public static void main(String[] args) throws Exception {
Schema schema = new Schema(Collections.singletonList(Field.nullable("a", 
new ArrowType.Utf8(;

try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
  root.setRowCount(2);
  VarCharVector v = (VarCharVector) root.getVector("a");
  v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
  try (OutputStream output = 
Files.newOutputStream(Paths.get("./test.bin"))) {
ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
writer.writeBatch();
writer.close();
  }
}
  }
}
{code}

{{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using {{set}} 
instead of {{setSafe}} will fail in Java.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown

2019-10-12 Thread David Li (Jira)
David Li created ARROW-6867:
---

 Summary: [FlightRPC][Java] Flight server can hang JVM on shutdown
 Key: ARROW-6867
 URL: https://issues.apache.org/jira/browse/ARROW-6867
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.15.0
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0


I noticed this while working on Flight integration tests. FlightService keeps 
an executor, which can hang the JVM on shutdown if the executor itself is not 
shut down.

It's used by Handshake and DoPut.

I think this surfaced because I wrote an AuthHandler that threw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-5643) Add ability to override hostname checking

2019-06-18 Thread David Li (JIRA)
David Li created ARROW-5643:
---

 Summary: Add ability to override hostname checking
 Key: ARROW-5643
 URL: https://issues.apache.org/jira/browse/ARROW-5643
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


We should add the ability to override hostname checks, so you can connect to 
localhost over TLS but still verify that the certificate is for some other 
domain.

Example: when deploying on Kubernetes with headless services, clients connect 
directly to backend services and do load balancing themselves. Thus all 
instances of an application must present a certificate for the same hostname. 
To do health checks in such an environment, you can't connect to the TLS 
hostname (which may resolve to a different instance); you need to connect to 
localhost, and override the hostname check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5529) [Flight] Allow serving with multiple TLS certificates

2019-06-07 Thread David Li (JIRA)
David Li created ARROW-5529:
---

 Summary: [Flight] Allow serving with multiple TLS certificates
 Key: ARROW-5529
 URL: https://issues.apache.org/jira/browse/ARROW-5529
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li
Assignee: David Li


We should allow serving a Flight service with more than one TLS certificate. 
This makes health checking easier in large deployments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5511) [Packaging] Enable Flight in Conda packages

2019-06-04 Thread David Li (JIRA)
David Li created ARROW-5511:
---

 Summary: [Packaging] Enable Flight in Conda packages
 Key: ARROW-5511
 URL: https://issues.apache.org/jira/browse/ARROW-5511
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging, Python
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


We should build Conda packages with Flight enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5397) Test Flight TLS support

2019-05-22 Thread David Li (JIRA)
David Li created ARROW-5397:
---

 Summary: Test Flight TLS support 
 Key: ARROW-5397
 URL: https://issues.apache.org/jira/browse/ARROW-5397
 Project: Apache Arrow
  Issue Type: Test
  Components: FlightRPC
Reporter: David Li


TLS support is not tested in Flight. We need to generate certificates/keys and 
provide them to the language-specific test runners.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5255) [Java] Implement user-defined data types API

2019-05-03 Thread David Li (JIRA)
David Li created ARROW-5255:
---

 Summary: [Java] Implement user-defined data types API
 Key: ARROW-5255
 URL: https://issues.apache.org/jira/browse/ARROW-5255
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: David Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5254) [Flight][Java] DoAction does not support result streams

2019-05-03 Thread David Li (JIRA)
David Li created ARROW-5254:
---

 Summary: [Flight][Java] DoAction does not support result streams
 Key: ARROW-5254
 URL: https://issues.apache.org/jira/browse/ARROW-5254
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


While Flight defines DoAction as returning a stream of results, the Java APIs 
only allow returning a single result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5160) ABORT_NOT_OK evalutes expression twice

2019-04-10 Thread David Li (JIRA)
David Li created ARROW-5160:
---

 Summary: ABORT_NOT_OK evalutes expression twice
 Key: ARROW-5160
 URL: https://issues.apache.org/jira/browse/ARROW-5160
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.13.0
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


ABORT_NOT_OK in gtest_util.h evaluates the expression twice due to a typo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries

2019-04-08 Thread David Li (JIRA)
David Li created ARROW-5143:
---

 Summary: [Flight] Enable integration testing of batches with 
dictionaries
 Key: ARROW-5143
 URL: https://issues.apache.org/jira/browse/ARROW-5143
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Integration
Reporter: David Li
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5137) [Flight] Implement authentication APIs

2019-04-08 Thread David Li (JIRA)
David Li created ARROW-5137:
---

 Summary: [Flight] Implement authentication APIs
 Key: ARROW-5137
 URL: https://issues.apache.org/jira/browse/ARROW-5137
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


>From the mailing list:
{quote}Proposal 3: Add an interface to define authentication protocols on the
 client and server, using the existing Handshake endpoint and adding a
 protocol-defined, per-call token.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5136) [Flight] Implement call options (timeouts)

2019-04-08 Thread David Li (JIRA)
David Li created ARROW-5136:
---

 Summary: [Flight] Implement call options (timeouts)
 Key: ARROW-5136
 URL: https://issues.apache.org/jira/browse/ARROW-5136
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


>From the mailing list:
{quote}Proposal 2: In client/server APIs, add a call options parameter to
 control timeouts and provide access to the identity of the
 authenticated peer (if any).
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5114) Test for cross-version Flight compatibility

2019-04-03 Thread David Li (JIRA)
David Li created ARROW-5114:
---

 Summary: Test for cross-version Flight compatibility
 Key: ARROW-5114
 URL: https://issues.apache.org/jira/browse/ARROW-5114
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


As Flight stabilizes, we should make sure that clients and servers in different 
versions can still communicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5095) [Flight][C++] Flight DoGet doesn't expose server error message

2019-04-02 Thread David Li (JIRA)
David Li created ARROW-5095:
---

 Summary: [Flight][C++] Flight DoGet doesn't expose server error 
message
 Key: ARROW-5095
 URL: https://issues.apache.org/jira/browse/ARROW-5095
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 0.13.0
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


If a server sends an error back in DoGet before sending the schema, the Flight 
client will report only "no data in Flight stream", not the actual error 
message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5081) [C++] Consistently use PATH_SUFFIXES in CMake config

2019-04-01 Thread David Li (JIRA)
David Li created ARROW-5081:
---

 Summary: [C++] Consistently use PATH_SUFFIXES in CMake config
 Key: ARROW-5081
 URL: https://issues.apache.org/jira/browse/ARROW-5081
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


In trying to set up a build using system libraries installed to non-default 
paths, CMake doesn't consistently search user-specified paths for libraries.

For instance, FindDoubleConversion.cmake will look only at 
${DoubleConversion_ROOT}/libdoubleconversion.so for the shared library, making 
it impossible to have a directory setup like doubleconversion/lib/*.so + 
doubleconversion/include. Other Find*.cmake files set PATH_SUFFIXES to also 
search the lib/ subdirectory; we should do this everywhere.

Additionally, it seems the various Find*.cmake files set PATH_SUFFIXES 
inconsistently. Some hardcode their own list, others use 
CMAKE_LIBRARY_ARCHITECTURE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4984) [Flight][C++] Flight server segfaults when port is in use

2019-03-21 Thread David Li (JIRA)
David Li created ARROW-4984:
---

 Summary: [Flight][C++] Flight server segfaults when port is in use
 Key: ARROW-4984
 URL: https://issues.apache.org/jira/browse/ARROW-4984
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


If a Flight server tries to bind to a port in use, it segfaults (as 
impl_->server_ will be nullptr).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4947) [Flight][C++/Python] Remove redundant schema parameter in DoGet

2019-03-18 Thread David Li (JIRA)
David Li created ARROW-4947:
---

 Summary: [Flight][C++/Python] Remove redundant schema parameter in 
DoGet
 Key: ARROW-4947
 URL: https://issues.apache.org/jira/browse/ARROW-4947
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC, Python
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


Now that the Flight implementations are consistent and DoGet streams are 
self-describing, we should remove the schema parameter to DoGet in C++/Python, 
as it isn't actually used anywhere. We should also enforce that the first 
message in the stream is the schema (Java implicitly does this already).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4945) [Flight] Enable Flight integration tests in Travis

2019-03-18 Thread David Li (JIRA)
David Li created ARROW-4945:
---

 Summary: [Flight] Enable Flight integration tests in Travis
 Key: ARROW-4945
 URL: https://issues.apache.org/jira/browse/ARROW-4945
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, FlightRPC
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


Need a way to mark the dictionary tests as XFAIL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4871) [Flight][Java] Handle large Flight messages

2019-03-14 Thread David Li (JIRA)
David Li created ARROW-4871:
---

 Summary: [Flight][Java] Handle large Flight messages
 Key: ARROW-4871
 URL: https://issues.apache.org/jira/browse/ARROW-4871
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


Similarly to ARROW-4421, Java/gRPC needs to be configured to allow large 
messages. The integration tests should also be updated to cover this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4858) [Flight][Python] Enable custom FlightDataStream in Python

2019-03-13 Thread David Li (JIRA)
David Li created ARROW-4858:
---

 Summary: [Flight][Python] Enable custom FlightDataStream in Python
 Key: ARROW-4858
 URL: https://issues.apache.org/jira/browse/ARROW-4858
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Python
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


We should be able to provide a custom data stream as the result of Flight 
do_get in Python. In particular, when returning data produced on the fly, or 
when returning a large Pandas DataFrame, it'd be nice to provide data in chunks 
as it becomes available, rather than having to copy everything into a Table 
first.

On the Python side, a FlightDataStream wrapper that accepts RecordBatches from 
a Python generator should suffice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4796) [Flight][Python] segfault in simple server implementation

2019-03-07 Thread David Li (JIRA)
David Li created ARROW-4796:
---

 Summary: [Flight][Python] segfault in simple server implementation
 Key: ARROW-4796
 URL: https://issues.apache.org/jira/browse/ARROW-4796
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Python
Reporter: David Li
Assignee: David Li


Python segfaults if you implement a Flight server that returns a data stream 
but does not keep a reference to the underlying data source (the Table, 
RecordBatch, etc). The Flight bindings themselves do not keep a reference to 
the object, so the server will segfault as the memory has been reclaimed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4627) [Flight] Add application metadata field to DoPut

2019-02-19 Thread David Li (JIRA)
David Li created ARROW-4627:
---

 Summary: [Flight] Add application metadata field to DoPut
 Key: ARROW-4627
 URL: https://issues.apache.org/jira/browse/ARROW-4627
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


As [proposed on the mailing 
list|https://lists.apache.org/thread.html/c550264cd60e000d77e10d9d7ac81ea8c49efc37ad447177fa8ee4ee@%3Cdev.arrow.apache.org%3E],
 we should add a field for application-specific metadata in DoPut payloads and 
expose this in the APIs. This also requires changing the client-streaming call 
into a bidirectional streaming call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4626) [Flight] Add application metadata field to DoGet

2019-02-19 Thread David Li (JIRA)
David Li created ARROW-4626:
---

 Summary: [Flight] Add application metadata field to DoGet
 Key: ARROW-4626
 URL: https://issues.apache.org/jira/browse/ARROW-4626
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


As [proposed on the mailing 
list|https://lists.apache.org/thread.html/c550264cd60e000d77e10d9d7ac81ea8c49efc37ad447177fa8ee4ee@%3Cdev.arrow.apache.org%3E],
 we should add a field for application-specific metadata in DoGet payloads and 
expose this in the APIs. The current APIs are rather RecordBatch-oriented, 
though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4625) [Flight] Wrap server busy-wait methods

2019-02-19 Thread David Li (JIRA)
David Li created ARROW-4625:
---

 Summary: [Flight] Wrap server busy-wait methods
 Key: ARROW-4625
 URL: https://issues.apache.org/jira/browse/ARROW-4625
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li
Assignee: David Li


Right now in Java, you must manually busy-wait in a loop as the gRPC server's 
awaitTermination method isn't exposed. Conversely, in C++, you have no choice 
but to busy-wait as starting the server calls awaitTermination for you. Either 
Java should also wait on the server, or both Java and C++ should expose an 
explicit operation to wait on the server.

I would prefer the latter as then the Python bindings could choose to manually 
busy-wait, which would let Ctrl-C work as normal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4587) Flight C++ DoPut segfaults

2019-02-15 Thread David Li (JIRA)
David Li created ARROW-4587:
---

 Summary: Flight C++ DoPut segfaults
 Key: ARROW-4587
 URL: https://issues.apache.org/jira/browse/ARROW-4587
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li


After Wes fixed the undefined behavior, it turns out the implementation of 
DoPut on the client side is now wrong. It should construct an IpcPayload 
instead of going through the underlying Protobuf.

Additionally, a previous patch accidentally exposed arrow::ipc::DictionaryBatch 
under arrow::DictionaryBatch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4575) Add Python Flight implementation to integration testing

2019-02-14 Thread David Li (JIRA)
David Li created ARROW-4575:
---

 Summary: Add Python Flight implementation to integration testing
 Key: ARROW-4575
 URL: https://issues.apache.org/jira/browse/ARROW-4575
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Integration, Python
Reporter: David Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4484) [Java] improve Flight DoPut busy wait

2019-02-05 Thread David Li (JIRA)
David Li created ARROW-4484:
---

 Summary: [Java] improve Flight DoPut busy wait
 Key: ARROW-4484
 URL: https://issues.apache.org/jira/browse/ARROW-4484
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: David Li


Currently the implementation of putNext in FlightClient.java busy-waits until 
gRPC indicates that the server can receive a message. We should either improve 
the busy-wait (e.g. add sleep times), or rethink the API and make it 
non-blocking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4474) [Flight] FlightInfo should use signed integer types for payload size

2019-02-04 Thread David Li (JIRA)
David Li created ARROW-4474:
---

 Summary: [Flight] FlightInfo should use signed integer types for 
payload size
 Key: ARROW-4474
 URL: https://issues.apache.org/jira/browse/ARROW-4474
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC
Reporter: David Li
Assignee: David Li


The de-facto practice is to use -1 in FlightInfo to indicate that the number of 
records/size of the payload is unknown, looking at the Java implementation. 
However, the Protobuf definition uses an unsigned integer type, as does the C++ 
implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4419) Deal with body buffers in FlightData

2019-01-29 Thread David Li (JIRA)
David Li created ARROW-4419:
---

 Summary: Deal with body buffers in FlightData
 Key: ARROW-4419
 URL: https://issues.apache.org/jira/browse/ARROW-4419
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


The Java implementation will fail to decode a schema message if the message 
also contains (empty) body buffers (see ArrowMessage.asSchema's precondition 
checks). However, clients using default Protobuf serialization will likely 
write an empty body buffer by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4230) Enable building flight against system gRPC

2019-01-10 Thread David Li (JIRA)
David Li created ARROW-4230:
---

 Summary: Enable building flight against system gRPC
 Key: ARROW-4230
 URL: https://issues.apache.org/jira/browse/ARROW-4230
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


Right now Flight assumes that gRPC is vendored or that it is installed with 
CMake. It would be easier to build if it accepted other installations of gRPC, 
such as ones from Conda (eventually) or system package managers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4213) [Flight] C++ and Java implementations are incompatible

2019-01-09 Thread David Li (JIRA)
David Li created ARROW-4213:
---

 Summary: [Flight] C++ and Java implementations are incompatible
 Key: ARROW-4213
 URL: https://issues.apache.org/jira/browse/ARROW-4213
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC
Reporter: David Li


A C++ client cannot request streams from a Java service, nor can it decode the 
schema from GetFlightInfo.

Schema: in Java, GetFlightInfo encodes the schema directly via flatbuffers. C++ 
expects it to be encoded as an IPC message. This isn't a problem in Java as a 
method exists to decode such schemas, but in C++ the API for reading such a 
schema isn't really exposed. I'm willing to submit a patch for this, but it's 
not clear to me which scheme is preferred.

Streams: in Java, DoGet starts with an ArrowMessage containing a schema. C++ 
does not expect this and segfaults when it tries to decode the message as a 
record batch. Based on the presentations I've seen, I think C++ is in the wrong 
here; I have a patch to fix this that I could clean up and submit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)