[jira] [Created] (ARROW-9093) [FlightRPC][C++][Python] Allow setting gRPC client options
David Li created ARROW-9093: --- Summary: [FlightRPC][C++][Python] Allow setting gRPC client options Key: ARROW-9093 URL: https://issues.apache.org/jira/browse/ARROW-9093 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC, Python Reporter: David Li Assignee: David Li There's no way to set generic gRPC options which are useful for tuning behavior (e.g. round-robin load balancing). Rather than bind all of these one by one, gRPC allows setting arguments as generic string-string or string-integer pairs; we could expose this (and leave the interpretation implementation-dependent). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8975) [FlightRPC][C++] Fix flaky MacOS tests
David Li created ARROW-8975: --- Summary: [FlightRPC][C++] Fix flaky MacOS tests Key: ARROW-8975 URL: https://issues.apache.org/jira/browse/ARROW-8975 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 0.17.1 Reporter: David Li Assignee: David Li The gRPC MacOS tests have been flaking again. Looking at [https://github.com/grpc/grpc/issues/20311] they may possibly have been fixed except [https://github.com/grpc/grpc/issues/13856] reports they haven't (in some configurations?) so I will try a few things in CI, or just disable the tests on MacOS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8958) [FlightRPC][Python] Implement Flight DoExchange for Python
David Li created ARROW-8958: --- Summary: [FlightRPC][Python] Implement Flight DoExchange for Python Key: ARROW-8958 URL: https://issues.apache.org/jira/browse/ARROW-8958 Project: Apache Arrow Issue Type: New Feature Components: FlightRPC, Python Reporter: David Li Assignee: David Li Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8957) [FlightRPC][C++] Fail to build due to IpcOptions
David Li created ARROW-8957: --- Summary: [FlightRPC][C++] Fail to build due to IpcOptions Key: ARROW-8957 URL: https://issues.apache.org/jira/browse/ARROW-8957 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li Assignee: David Li Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
David Li created ARROW-8889: --- Summary: [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None Key: ARROW-8889 URL: https://issues.apache.org/jira/browse/ARROW-8889 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Reporter: David Li This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. It seems to happen even when built from source, but I used the wheels for this reproduction. {noformat} > uname -a Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + x86_64 GNU/Linux > python --version Python 3.7.7 > pip freeze numpy==1.18.4 pyarrow==0.17.1{noformat} Reproduction: {code:python} import pyarrow as pa table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) batches = table.to_batches() batches[0].equals(None) {code} {noformat} #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch const&, bool) const () from /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 #2 0x7fffe084a6e0 in __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) () from /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, self=0x7fffdefd7110, args=0x7786f5c8, nargs=, kwnames=) at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 #4 0x556c06af in _PyMethodDescr_FastCallKeywords (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 #5 0x55724add in call_function (kwnames=0x0, oparg=2, pp_stack=) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 #6 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, globals=, locals=, args=, argcount=, kwnames=0x0, kwargs=0x0, kwcount=, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, globals=, locals=, args=, argcount=, kws=, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 #9 0x5566a1ec in PyEval_EvalCode (co=, globals=, locals=) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 #10 0x55780cb4 in run_mod (mod=, filename=, globals=0x778d7c30, locals=0x778d7c30, flags=, arena=) at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, filename_str=, start=, globals=0x778d7c30, locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, filename=, closeit=1, flags=0x7fffe1b0) at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 #15 pymain_run_python (pymain=0x7fffe2c0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 #16 pymain_main (pymain=0x7fffe2c0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 #17 0x5578c51c in _Py_UnixMain (argc=, argv=) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8879) [FlightRPC][Java] FlightStream should unwrap ExecutionExceptions
David Li created ARROW-8879: --- Summary: [FlightRPC][Java] FlightStream should unwrap ExecutionExceptions Key: ARROW-8879 URL: https://issues.apache.org/jira/browse/ARROW-8879 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Affects Versions: 0.17.1 Reporter: David Li Assignee: David Li Currently FlightStream bubbles a lot of exceptions as RuntimeException or ExecutionException, or just wraps them with CallStatus.INTERNAL. For RuntimeException, we should always check if it's a gRPC StatusRuntimeException and convert to the equivalent Flight exception; for ExecutionException, we should check if the _cause_ is a gRPC exception and convert. Example: on master, FlightStream#getDescriptor reports all errors as CallStatus.INTERNAL, but we should inspect ExecutionException#getCause instead. This is needed so that errors get properly reported, e.g. if a service sends a PERMISSION_DENIED error, the client should get that and not a RuntimeException, ExecutionException, or INTERNAL error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8858) [FlightRPC] Ensure headers are uniformly exposed
David Li created ARROW-8858: --- Summary: [FlightRPC] Ensure headers are uniformly exposed Key: ARROW-8858 URL: https://issues.apache.org/jira/browse/ARROW-8858 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java, Python Affects Versions: 0.17.0 Reporter: David Li Assignee: David Li * Java: MetadataAdapter should support iterating through binary headers * Python: binary headers need to be present in the output -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8776) [FlightRPC][C++] Flight/C++ middleware don't receive headers on failed calls to Java servers
David Li created ARROW-8776: --- Summary: [FlightRPC][C++] Flight/C++ middleware don't receive headers on failed calls to Java servers Key: ARROW-8776 URL: https://issues.apache.org/jira/browse/ARROW-8776 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 0.17.0 Reporter: David Li Assignee: David Li Fix For: 1.0.0 For a failed call, gRPC/Java may consolidate headers with trailers, so Flight/C++ needs to check both headers and trailers to get any headers that may have been sent. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8775) [C++][FlightRPC] Integration client doesn't run integration tests
David Li created ARROW-8775: --- Summary: [C++][FlightRPC] Integration client doesn't run integration tests Key: ARROW-8775 URL: https://issues.apache.org/jira/browse/ARROW-8775 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC, Integration Reporter: David Li Assignee: David Li Looks like I rebased badly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8749) [C++] IpcFormatWriter writes dictionary batches with wrong ID
David Li created ARROW-8749: --- Summary: [C++] IpcFormatWriter writes dictionary batches with wrong ID Key: ARROW-8749 URL: https://issues.apache.org/jira/browse/ARROW-8749 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.17.0 Reporter: David Li IpcFormatWriter assigns dictionary IDs once when it writes the schema message. Then, when it writes dictionary batches, it assigns dictionary IDs again because it re-collects dictionaries from the given batch. So for example, if you have 5 dictionaries, the first dictionary will end up with ID 0 but be written with ID 5. For example, this will fail with "'_error_or_value11.status()' failed with Key error: No record of dictionary type with id 9" {code:cpp} TEST_F(TestMetadata, DoPutDictionaries) { ASSERT_OK_AND_ASSIGN(auto sink, arrow::io::BufferOutputStream::Create()); std::shared_ptr schema = ExampleDictSchema(); BatchVector expected_batches; ASSERT_OK(ExampleDictBatches(_batches)); ASSERT_OK_AND_ASSIGN(auto writer, arrow::ipc::NewStreamWriter(sink.get(), schema)); for (auto& batch : expected_batches) { ASSERT_OK(writer->WriteRecordBatch(*batch)); } ASSERT_OK_AND_ASSIGN(auto buf, sink->Finish()); arrow::io::BufferReader source(buf); ASSERT_OK_AND_ASSIGN(auto reader, arrow::ipc::RecordBatchStreamReader::Open()); AssertSchemaEqual(schema, reader->schema()); for (auto& batch : expected_batches) { ASSERT_OK_AND_ASSIGN(auto actual, reader->Next()); AssertBatchesEqual(*actual, *batch); } }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8691) [C++] Don't re-initialize Minio in every s3fs benchmark
David Li created ARROW-8691: --- Summary: [C++] Don't re-initialize Minio in every s3fs benchmark Key: ARROW-8691 URL: https://issues.apache.org/jira/browse/ARROW-8691 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Minio is initialized in a googlebenchmark fixture, which means that every benchmark will start a new server and re-generate all the data. We should better control this so that we can at least share data within a benchmark run, since it's the same files between all benchmarks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8666) [Java] DenseUnionVector has no way to set offset/validity directly
David Li created ARROW-8666: --- Summary: [Java] DenseUnionVector has no way to set offset/validity directly Key: ARROW-8666 URL: https://issues.apache.org/jira/browse/ARROW-8666 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 0.17.0 Reporter: David Li You can set the type ID manually, but you cannot set the offset or validity directly. Ideally, we'd have an API like Python that lets us build it directly from constituent vectors and the offsets/type IDs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8665) [Java] DenseUnionWriter#setPosition fails with NullPointerException
David Li created ARROW-8665: --- Summary: [Java] DenseUnionWriter#setPosition fails with NullPointerException Key: ARROW-8665 URL: https://issues.apache.org/jira/browse/ARROW-8665 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 0.17.0 Reporter: David Li The writer always iterates through all BaseWriters, and an array of 128 BaseWriters is allocated. So if you do not have 128 typeIds and do not touch all of them, setPosition will give you an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8572) [Python] Expose UnionArray.array and other fields
David Li created ARROW-8572: --- Summary: [Python] Expose UnionArray.array and other fields Key: ARROW-8572 URL: https://issues.apache.org/jira/browse/ARROW-8572 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.17.0 Reporter: David Li Assignee: David Li Currently in Python, you can construct a UnionArray easily, but getting the data back out (without copying) is near-impossible. We should expose the getter for UnionArray.array so we can pull out the constituent arrays. We should also expose fields like mode while we're at it. The use case is: in Flight, we'd like to write multiple distinct datasets (with distinct schemas) in a single logical call; using UnionArrays lets us combine these datasets into a single logical dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8555) [FlightRPC][Java] Implement Flight DoExchange for Java
David Li created ARROW-8555: --- Summary: [FlightRPC][Java] Implement Flight DoExchange for Java Key: ARROW-8555 URL: https://issues.apache.org/jira/browse/ARROW-8555 Project: Apache Arrow Issue Type: New Feature Components: FlightRPC Reporter: David Li Assignee: David Li As described in the mailing list vote. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8487) [FlightRPC][C++] Make it possible to target a specific payload size
David Li created ARROW-8487: --- Summary: [FlightRPC][C++] Make it possible to target a specific payload size Key: ARROW-8487 URL: https://issues.apache.org/jira/browse/ARROW-8487 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li gRPC by default limits message sizes on the wire. While Flight in turn disables these by default, they're still useful to be able to control memory consumption. A well-behaved client/server may wish to split up writes to respect these limits. However, right now, there's no way to measure the memory usage of what you're about to write without serializing it. With ARROW-5377, we can in theory avoid this by having the writer take control of serialization, producing the IpcPayload, then measuring the size and writing the payload if the size is as desired. However, Flight doesn't provide such a low-level mechanism yet - we'd need to open that up as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8327) [FlightRPC][Java] gRPC trailers may be null
David Li created ARROW-8327: --- Summary: [FlightRPC][Java] gRPC trailers may be null Key: ARROW-8327 URL: https://issues.apache.org/jira/browse/ARROW-8327 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li Assignee: David Li This can cause spurious failures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8297) [FlightRPC][C++] Implement Flight DoExchange for C++
David Li created ARROW-8297: --- Summary: [FlightRPC][C++] Implement Flight DoExchange for C++ Key: ARROW-8297 URL: https://issues.apache.org/jira/browse/ARROW-8297 Project: Apache Arrow Issue Type: New Feature Components: FlightRPC Reporter: David Li Assignee: David Li As described in the mailing list vote. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8176) [FlightRPC][Integration] Have Flight services bind to port 0 in integration
David Li created ARROW-8176: --- Summary: [FlightRPC][Integration] Have Flight services bind to port 0 in integration Key: ARROW-8176 URL: https://issues.apache.org/jira/browse/ARROW-8176 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Integration Reporter: David Li In integration tests, instead of allocating a port and then trying to bind to it, we should have the Flight server bind to port 0, then have the test runner parse out the port. This avoids flakiness due to port collisions. This also will give us the ability to know when the Flight server has actually started. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8160) [FlightRPC][C++] DoPutPayloadWriter doesn't always expose server error message
David Li created ARROW-8160: --- Summary: [FlightRPC][C++] DoPutPayloadWriter doesn't always expose server error message Key: ARROW-8160 URL: https://issues.apache.org/jira/browse/ARROW-8160 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 0.16.0 Reporter: David Li {noformat} C:/projects/arrow/cpp/src/arrow/flight/flight_test.cc(1261): error: Value of: status.message() Expected: has substring "Invalid token" Actual: "Could not write record batch to stream: " [ FAILED ] TestBasicAuthHandler.FailUnauthenticatedCalls (17 ms) {noformat} This happens because {{Close()}} calls {{RecordBatchPayloadWriter::Close()}}, which calls {{CheckStarted}}, which in turn tries to write data. If the data gets flushed and the server responds in time, we'll see a failure during writing, causing us to never check the server status (which is the last part of {{DoPutPayloadWriter::Close}}). We need to reliably check and expose the gRPC status. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8152) [C++] IO: split large coalesced reads into smaller ones
David Li created ARROW-8152: --- Summary: [C++] IO: split large coalesced reads into smaller ones Key: ARROW-8152 URL: https://issues.apache.org/jira/browse/ARROW-8152 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li We have a facility to coalesce small reads, but remote filesystems may also benefit from splitting large reads to take advantage of concurrency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8151) [Benchmarking][Dataset] Benchmark Parquet read performance with S3File
David Li created ARROW-8151: --- Summary: [Benchmarking][Dataset] Benchmark Parquet read performance with S3File Key: ARROW-8151 URL: https://issues.apache.org/jira/browse/ARROW-8151 Project: Apache Arrow Issue Type: Bug Components: Benchmarking, C++ - Dataset Reporter: David Li Assignee: David Li We should establish a performance baseline with the current S3File implementation and Parquet reader before proceeding with work like PARQUET-1698. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8112) [FlightRPC][C++] Some status codes don't round-trip through gRPC
David Li created ARROW-8112: --- Summary: [FlightRPC][C++] Some status codes don't round-trip through gRPC Key: ARROW-8112 URL: https://issues.apache.org/jira/browse/ARROW-8112 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 0.16.0 Reporter: David Li Assignee: David Li KeyError and AlreadyExists don't fully round-trip, instead becoming UNKNOWN. There are others, but we don't attempt to map all the Arrow status to a gRPC status, only the ones that closely correspond to a gRPC error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8101) [FlightRPC][Java] Can't read/write only an empty null array
David Li created ARROW-8101: --- Summary: [FlightRPC][Java] Can't read/write only an empty null array Key: ARROW-8101 URL: https://issues.apache.org/jira/browse/ARROW-8101 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 0.16.0 Reporter: David Li Assignee: David Li This is rather an edge case, but Java/Flight fails with a table consisting of only an empty null array, since it has no buffers, and Java assumes this can never happen. {noformat} Exception in thread "main" org.apache.arrow.flight.FlightRuntimeException: CallStatus{code=CANCELLED, cause=java.lang.RuntimeException: Unexpected IO Exception, description='Failed to stream message'} at org.apache.arrow.flight.CallStatus.toRuntimeException(CallStatus.java:113) at org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:134) at org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:142) at org.apache.arrow.flight.FlightClient$SetStreamObserver.onError(FlightClient.java:315) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:442) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.arrow.flight.grpc.ClientInterceptorAdapter$FlightClientCallListener.onClose(ClientInterceptorAdapter.java:117) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399) at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:510) at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:630) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:518) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:692) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:681) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Unexpected IO Exception at org.apache.arrow.flight.ArrowMessage.asInputStream(ArrowMessage.java:334) at org.apache.arrow.flight.ArrowMessage.access$000(ArrowMessage.java:64) at org.apache.arrow.flight.ArrowMessage$ArrowMessageHolderMarshaller.stream(ArrowMessage.java:382) at org.apache.arrow.flight.ArrowMessage$ArrowMessageHolderMarshaller.stream(ArrowMessage.java:372) at io.grpc.MethodDescriptor.streamRequest(MethodDescriptor.java:290) at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:341) at org.apache.arrow.flight.FlightClient$PutObserver.putNext(FlightClient.java:354) at org.apache.arrow.flight.example.integration.IntegrationTestClient.testStream(IntegrationTestClient.java:132) at
[jira] [Created] (ARROW-7734) [C++] Segfault when comparing status with and without detail
David Li created ARROW-7734: --- Summary: [C++] Segfault when comparing status with and without detail Key: ARROW-7734 URL: https://issues.apache.org/jira/browse/ARROW-7734 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.16.0 Reporter: David Li Assignee: David Li I noticed this while working on Flight integration tests. The equality operator for Status doesn't check whether the status detail is nullptr before dereferencing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7721) [FlightRPC][Java] Flaky Flight auth test
David Li created ARROW-7721: --- Summary: [FlightRPC][Java] Flaky Flight auth test Key: ARROW-7721 URL: https://issues.apache.org/jira/browse/ARROW-7721 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 0.16.0 Reporter: David Li See https://github.com/apache/arrow/commit/8b42288f58caa84a40bb7a13c1731ff919c934f2/checks?check_suite_id=426509031 {noformat} [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.171 s <<< FAILURE! - in org.apache.arrow.flight.auth.TestBasicAuth [ERROR] asyncCall Time elapsed: 0.068 s <<< ERROR! java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (65536) Allocator(ROOT) 0/65536/131584/9223372036854775807 (res/actual/peak/limit) at org.apache.arrow.flight.auth.TestBasicAuth.shutdown(TestBasicAuth.java:152) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7663) from_pandas gives TypeError instead of ArrowTypeError in some cases
David Li created ARROW-7663: --- Summary: from_pandas gives TypeError instead of ArrowTypeError in some cases Key: ARROW-7663 URL: https://issues.apache.org/jira/browse/ARROW-7663 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.1 Reporter: David Li from_pandas sometimes raises a TypeError with an uninformative error message rather than an ArrowTypeError with the full, informative type error for mixed-type array columns: {noformat} >>> pa.Table.from_pandas(pd.DataFrame({"a": ['a', 1]})) Traceback (most recent call last): File "", line 1, in File "pyarrow/table.pxi", line 1177, in pyarrow.lib.Table.from_pandas File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 575, in dataframe_to_arrays for c, f in zip(columns_to_convert, convert_fields)] File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 575, in for c, f in zip(columns_to_convert, convert_fields)] File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 566, in convert_column raise e File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 560, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 265, in pyarrow.lib.array File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status pyarrow.lib.ArrowTypeError: ("Expected a bytes object, got a 'int' object", 'Conversion failed for column a with type object') >>> pa.Table.from_pandas(pd.DataFrame({"a": [1, 'a']})) Traceback (most recent call last): File "", line 1, in File "pyarrow/table.pxi", line 1177, in pyarrow.lib.Table.from_pandas File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 575, in dataframe_to_arrays for c, f in zip(columns_to_convert, convert_fields)] File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 575, in for c, f in zip(columns_to_convert, convert_fields)] File "/Users/lidavidm/Flight/arrow/build/python/lib.macosx-10.12-x86_64-3.7/pyarrow/pandas_compat.py", line 560, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 265, in pyarrow.lib.array File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array TypeError: an integer is required (got type str) {noformat} Noticed on 0.15.1 and on master when we tried to upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7579) [FlightRPC] Make Handshake optional
David Li created ARROW-7579: --- Summary: [FlightRPC] Make Handshake optional Key: ARROW-7579 URL: https://issues.apache.org/jira/browse/ARROW-7579 Project: Apache Arrow Issue Type: Bug Components: FlightRPC Reporter: David Li Fix For: 1.0.0 We should make it possible to _not_ invoke Handshake for services that don't want it. Especially when using it with flight-grpc, where the standard gRPC authentication mechanisms don't know about Flight and try to authenticate the Handshake endpoint - it's easy to forget to configure this endpoint to bypass authentication. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7538) Clarify actual and desired size in AllocationManager
David Li created ARROW-7538: --- Summary: Clarify actual and desired size in AllocationManager Key: ARROW-7538 URL: https://issues.apache.org/jira/browse/ARROW-7538 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li As a follow up to the review of ARROW-7329, we should clarify the different sizes (desired vs actual size) in AllocationManager: https://github.com/apache/arrow/pull/5973#discussion_r354729754 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7477) [FlightRPC][Java] Flight gRPC service is missing reflection info
David Li created ARROW-7477: --- Summary: [FlightRPC][Java] Flight gRPC service is missing reflection info Key: ARROW-7477 URL: https://issues.apache.org/jira/browse/ARROW-7477 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Affects Versions: 0.14.1 Reporter: David Li Assignee: David Li Fix For: 1.0.0 When setting up the gRPC service, we mangle the gRPC [service descriptor|https://github.com/apache/arrow/blob/master/java/flight/src/main/java/org/apache/arrow/flight/FlightBindingService.java], removing reflection information. This means things like gRPC reflection don't work, which is necessary for debugging/development tools like [grpcurl|https://github.com/fullstorydev/grpcurl/]. Reflection information is also useful to do things like authorization/access control based on RPC method. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7343) Memory leak in Flight ArrowMessage
David Li created ARROW-7343: --- Summary: Memory leak in Flight ArrowMessage Key: ARROW-7343 URL: https://issues.apache.org/jira/browse/ARROW-7343 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 0.15.1 Reporter: David Li Assignee: David Li I believe this causes things like ARROW-4765. If a stream is interrupted or otherwise not drained on the server-side, the serialized form of the ArrowMessage (DrainableByteBufInputStream) will sit around forever, leaking memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7254) BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
David Li created ARROW-7254: --- Summary: BaseVariableWidthVector#setSafe appears to make value offsets inconsistent Key: ARROW-7254 URL: https://issues.apache.org/jira/browse/ARROW-7254 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 0.15.1 Reporter: David Li The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} on reading. Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index fixes it. While it seems from the new documentation that we should (must?) call {{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to get an invalid file by calling using {{setSafe}}, either. Full traceback: {noformat} > python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", > "rb")).read_pandas())' Traceback (most recent call last): File "", line 1, in File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas table = self.read_all() File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4 {noformat} Full program: {code:java} import java.io.OutputStream; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Paths; import java.util.Collections; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VarCharVector; import org.apache.arrow.vector.VectorSchemaRoot; import org.apache.arrow.vector.ipc.ArrowStreamWriter; import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.Schema; public class AsdfTest { public static void main(String[] args) throws Exception { Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8(; try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE); VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) { root.setRowCount(2); VarCharVector v = (VarCharVector) root.getVector("a"); v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8)); try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) { ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output); writer.writeBatch(); writer.close(); } } } } {code} {{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using {{set}} instead of {{setSafe}} will fail in Java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown
David Li created ARROW-6867: --- Summary: [FlightRPC][Java] Flight server can hang JVM on shutdown Key: ARROW-6867 URL: https://issues.apache.org/jira/browse/ARROW-6867 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 0.15.0 Reporter: David Li Assignee: David Li Fix For: 1.0.0 I noticed this while working on Flight integration tests. FlightService keeps an executor, which can hang the JVM on shutdown if the executor itself is not shut down. It's used by Handshake and DoPut. I think this surfaced because I wrote an AuthHandler that threw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-5643) Add ability to override hostname checking
David Li created ARROW-5643: --- Summary: Add ability to override hostname checking Key: ARROW-5643 URL: https://issues.apache.org/jira/browse/ARROW-5643 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li Assignee: David Li Fix For: 0.14.0 We should add the ability to override hostname checks, so you can connect to localhost over TLS but still verify that the certificate is for some other domain. Example: when deploying on Kubernetes with headless services, clients connect directly to backend services and do load balancing themselves. Thus all instances of an application must present a certificate for the same hostname. To do health checks in such an environment, you can't connect to the TLS hostname (which may resolve to a different instance); you need to connect to localhost, and override the hostname check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5529) [Flight] Allow serving with multiple TLS certificates
David Li created ARROW-5529: --- Summary: [Flight] Allow serving with multiple TLS certificates Key: ARROW-5529 URL: https://issues.apache.org/jira/browse/ARROW-5529 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li Assignee: David Li We should allow serving a Flight service with more than one TLS certificate. This makes health checking easier in large deployments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5511) [Packaging] Enable Flight in Conda packages
David Li created ARROW-5511: --- Summary: [Packaging] Enable Flight in Conda packages Key: ARROW-5511 URL: https://issues.apache.org/jira/browse/ARROW-5511 Project: Apache Arrow Issue Type: Improvement Components: C++, Packaging, Python Reporter: David Li Assignee: David Li Fix For: 0.14.0 We should build Conda packages with Flight enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5397) Test Flight TLS support
David Li created ARROW-5397: --- Summary: Test Flight TLS support Key: ARROW-5397 URL: https://issues.apache.org/jira/browse/ARROW-5397 Project: Apache Arrow Issue Type: Test Components: FlightRPC Reporter: David Li TLS support is not tested in Flight. We need to generate certificates/keys and provide them to the language-specific test runners. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5255) [Java] Implement user-defined data types API
David Li created ARROW-5255: --- Summary: [Java] Implement user-defined data types API Key: ARROW-5255 URL: https://issues.apache.org/jira/browse/ARROW-5255 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: David Li -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5254) [Flight][Java] DoAction does not support result streams
David Li created ARROW-5254: --- Summary: [Flight][Java] DoAction does not support result streams Key: ARROW-5254 URL: https://issues.apache.org/jira/browse/ARROW-5254 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li Assignee: David Li Fix For: 0.14.0 While Flight defines DoAction as returning a stream of results, the Java APIs only allow returning a single result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5160) ABORT_NOT_OK evalutes expression twice
David Li created ARROW-5160: --- Summary: ABORT_NOT_OK evalutes expression twice Key: ARROW-5160 URL: https://issues.apache.org/jira/browse/ARROW-5160 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.13.0 Reporter: David Li Assignee: David Li Fix For: 0.14.0 ABORT_NOT_OK in gtest_util.h evaluates the expression twice due to a typo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries
David Li created ARROW-5143: --- Summary: [Flight] Enable integration testing of batches with dictionaries Key: ARROW-5143 URL: https://issues.apache.org/jira/browse/ARROW-5143 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Integration Reporter: David Li Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5137) [Flight] Implement authentication APIs
David Li created ARROW-5137: --- Summary: [Flight] Implement authentication APIs Key: ARROW-5137 URL: https://issues.apache.org/jira/browse/ARROW-5137 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li Assignee: David Li Fix For: 0.14.0 >From the mailing list: {quote}Proposal 3: Add an interface to define authentication protocols on the client and server, using the existing Handshake endpoint and adding a protocol-defined, per-call token. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5136) [Flight] Implement call options (timeouts)
David Li created ARROW-5136: --- Summary: [Flight] Implement call options (timeouts) Key: ARROW-5136 URL: https://issues.apache.org/jira/browse/ARROW-5136 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li Assignee: David Li Fix For: 0.14.0 >From the mailing list: {quote}Proposal 2: In client/server APIs, add a call options parameter to control timeouts and provide access to the identity of the authenticated peer (if any). {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5114) Test for cross-version Flight compatibility
David Li created ARROW-5114: --- Summary: Test for cross-version Flight compatibility Key: ARROW-5114 URL: https://issues.apache.org/jira/browse/ARROW-5114 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li As Flight stabilizes, we should make sure that clients and servers in different versions can still communicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5095) [Flight][C++] Flight DoGet doesn't expose server error message
David Li created ARROW-5095: --- Summary: [Flight][C++] Flight DoGet doesn't expose server error message Key: ARROW-5095 URL: https://issues.apache.org/jira/browse/ARROW-5095 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 0.13.0 Reporter: David Li Assignee: David Li Fix For: 0.14.0 If a server sends an error back in DoGet before sending the schema, the Flight client will report only "no data in Flight stream", not the actual error message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5081) [C++] Consistently use PATH_SUFFIXES in CMake config
David Li created ARROW-5081: --- Summary: [C++] Consistently use PATH_SUFFIXES in CMake config Key: ARROW-5081 URL: https://issues.apache.org/jira/browse/ARROW-5081 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li In trying to set up a build using system libraries installed to non-default paths, CMake doesn't consistently search user-specified paths for libraries. For instance, FindDoubleConversion.cmake will look only at ${DoubleConversion_ROOT}/libdoubleconversion.so for the shared library, making it impossible to have a directory setup like doubleconversion/lib/*.so + doubleconversion/include. Other Find*.cmake files set PATH_SUFFIXES to also search the lib/ subdirectory; we should do this everywhere. Additionally, it seems the various Find*.cmake files set PATH_SUFFIXES inconsistently. Some hardcode their own list, others use CMAKE_LIBRARY_ARCHITECTURE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4984) [Flight][C++] Flight server segfaults when port is in use
David Li created ARROW-4984: --- Summary: [Flight][C++] Flight server segfaults when port is in use Key: ARROW-4984 URL: https://issues.apache.org/jira/browse/ARROW-4984 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li Assignee: David Li Fix For: 0.14.0 If a Flight server tries to bind to a port in use, it segfaults (as impl_->server_ will be nullptr). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4947) [Flight][C++/Python] Remove redundant schema parameter in DoGet
David Li created ARROW-4947: --- Summary: [Flight][C++/Python] Remove redundant schema parameter in DoGet Key: ARROW-4947 URL: https://issues.apache.org/jira/browse/ARROW-4947 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC, Python Reporter: David Li Assignee: David Li Fix For: 0.14.0 Now that the Flight implementations are consistent and DoGet streams are self-describing, we should remove the schema parameter to DoGet in C++/Python, as it isn't actually used anywhere. We should also enforce that the first message in the stream is the schema (Java implicitly does this already). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4945) [Flight] Enable Flight integration tests in Travis
David Li created ARROW-4945: --- Summary: [Flight] Enable Flight integration tests in Travis Key: ARROW-4945 URL: https://issues.apache.org/jira/browse/ARROW-4945 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, FlightRPC Reporter: David Li Assignee: David Li Fix For: 0.14.0 Need a way to mark the dictionary tests as XFAIL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4871) [Flight][Java] Handle large Flight messages
David Li created ARROW-4871: --- Summary: [Flight][Java] Handle large Flight messages Key: ARROW-4871 URL: https://issues.apache.org/jira/browse/ARROW-4871 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li Assignee: David Li Fix For: 0.14.0 Similarly to ARROW-4421, Java/gRPC needs to be configured to allow large messages. The integration tests should also be updated to cover this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4858) [Flight][Python] Enable custom FlightDataStream in Python
David Li created ARROW-4858: --- Summary: [Flight][Python] Enable custom FlightDataStream in Python Key: ARROW-4858 URL: https://issues.apache.org/jira/browse/ARROW-4858 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Python Reporter: David Li Assignee: David Li Fix For: 0.14.0 We should be able to provide a custom data stream as the result of Flight do_get in Python. In particular, when returning data produced on the fly, or when returning a large Pandas DataFrame, it'd be nice to provide data in chunks as it becomes available, rather than having to copy everything into a Table first. On the Python side, a FlightDataStream wrapper that accepts RecordBatches from a Python generator should suffice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4796) [Flight][Python] segfault in simple server implementation
David Li created ARROW-4796: --- Summary: [Flight][Python] segfault in simple server implementation Key: ARROW-4796 URL: https://issues.apache.org/jira/browse/ARROW-4796 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Python Reporter: David Li Assignee: David Li Python segfaults if you implement a Flight server that returns a data stream but does not keep a reference to the underlying data source (the Table, RecordBatch, etc). The Flight bindings themselves do not keep a reference to the object, so the server will segfault as the memory has been reclaimed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4627) [Flight] Add application metadata field to DoPut
David Li created ARROW-4627: --- Summary: [Flight] Add application metadata field to DoPut Key: ARROW-4627 URL: https://issues.apache.org/jira/browse/ARROW-4627 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li As [proposed on the mailing list|https://lists.apache.org/thread.html/c550264cd60e000d77e10d9d7ac81ea8c49efc37ad447177fa8ee4ee@%3Cdev.arrow.apache.org%3E], we should add a field for application-specific metadata in DoPut payloads and expose this in the APIs. This also requires changing the client-streaming call into a bidirectional streaming call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4626) [Flight] Add application metadata field to DoGet
David Li created ARROW-4626: --- Summary: [Flight] Add application metadata field to DoGet Key: ARROW-4626 URL: https://issues.apache.org/jira/browse/ARROW-4626 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li As [proposed on the mailing list|https://lists.apache.org/thread.html/c550264cd60e000d77e10d9d7ac81ea8c49efc37ad447177fa8ee4ee@%3Cdev.arrow.apache.org%3E], we should add a field for application-specific metadata in DoGet payloads and expose this in the APIs. The current APIs are rather RecordBatch-oriented, though. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4625) [Flight] Wrap server busy-wait methods
David Li created ARROW-4625: --- Summary: [Flight] Wrap server busy-wait methods Key: ARROW-4625 URL: https://issues.apache.org/jira/browse/ARROW-4625 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li Assignee: David Li Right now in Java, you must manually busy-wait in a loop as the gRPC server's awaitTermination method isn't exposed. Conversely, in C++, you have no choice but to busy-wait as starting the server calls awaitTermination for you. Either Java should also wait on the server, or both Java and C++ should expose an explicit operation to wait on the server. I would prefer the latter as then the Python bindings could choose to manually busy-wait, which would let Ctrl-C work as normal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4587) Flight C++ DoPut segfaults
David Li created ARROW-4587: --- Summary: Flight C++ DoPut segfaults Key: ARROW-4587 URL: https://issues.apache.org/jira/browse/ARROW-4587 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li Assignee: David Li After Wes fixed the undefined behavior, it turns out the implementation of DoPut on the client side is now wrong. It should construct an IpcPayload instead of going through the underlying Protobuf. Additionally, a previous patch accidentally exposed arrow::ipc::DictionaryBatch under arrow::DictionaryBatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4575) Add Python Flight implementation to integration testing
David Li created ARROW-4575: --- Summary: Add Python Flight implementation to integration testing Key: ARROW-4575 URL: https://issues.apache.org/jira/browse/ARROW-4575 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Integration, Python Reporter: David Li -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4484) [Java] improve Flight DoPut busy wait
David Li created ARROW-4484: --- Summary: [Java] improve Flight DoPut busy wait Key: ARROW-4484 URL: https://issues.apache.org/jira/browse/ARROW-4484 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Reporter: David Li Currently the implementation of putNext in FlightClient.java busy-waits until gRPC indicates that the server can receive a message. We should either improve the busy-wait (e.g. add sleep times), or rethink the API and make it non-blocking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4474) [Flight] FlightInfo should use signed integer types for payload size
David Li created ARROW-4474: --- Summary: [Flight] FlightInfo should use signed integer types for payload size Key: ARROW-4474 URL: https://issues.apache.org/jira/browse/ARROW-4474 Project: Apache Arrow Issue Type: Bug Components: FlightRPC Reporter: David Li Assignee: David Li The de-facto practice is to use -1 in FlightInfo to indicate that the number of records/size of the payload is unknown, looking at the Java implementation. However, the Protobuf definition uses an unsigned integer type, as does the C++ implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4419) Deal with body buffers in FlightData
David Li created ARROW-4419: --- Summary: Deal with body buffers in FlightData Key: ARROW-4419 URL: https://issues.apache.org/jira/browse/ARROW-4419 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li The Java implementation will fail to decode a schema message if the message also contains (empty) body buffers (see ArrowMessage.asSchema's precondition checks). However, clients using default Protobuf serialization will likely write an empty body buffer by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4230) Enable building flight against system gRPC
David Li created ARROW-4230: --- Summary: Enable building flight against system gRPC Key: ARROW-4230 URL: https://issues.apache.org/jira/browse/ARROW-4230 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC Reporter: David Li Right now Flight assumes that gRPC is vendored or that it is installed with CMake. It would be easier to build if it accepted other installations of gRPC, such as ones from Conda (eventually) or system package managers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4213) [Flight] C++ and Java implementations are incompatible
David Li created ARROW-4213: --- Summary: [Flight] C++ and Java implementations are incompatible Key: ARROW-4213 URL: https://issues.apache.org/jira/browse/ARROW-4213 Project: Apache Arrow Issue Type: Bug Components: FlightRPC Reporter: David Li A C++ client cannot request streams from a Java service, nor can it decode the schema from GetFlightInfo. Schema: in Java, GetFlightInfo encodes the schema directly via flatbuffers. C++ expects it to be encoded as an IPC message. This isn't a problem in Java as a method exists to decode such schemas, but in C++ the API for reading such a schema isn't really exposed. I'm willing to submit a patch for this, but it's not clear to me which scheme is preferred. Streams: in Java, DoGet starts with an ArrowMessage containing a schema. C++ does not expect this and segfaults when it tries to decode the message as a record batch. Based on the presentations I've seen, I think C++ is in the wrong here; I have a patch to fix this that I could clean up and submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)