[jira] [Created] (ARROW-18346) [Python] Dataset writer API papercuts
David Li created ARROW-18346: Summary: [Python] Dataset writer API papercuts Key: ARROW-18346 URL: https://issues.apache.org/jira/browse/ARROW-18346 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 10.0.0 Reporter: David Li * Writer options are not very discoverable. Perhaps "file_options" should mention compression as an example of something you can control, so people looking for it know where to go next? * Compression seems like it might be common enough to warrant a top-level parameter somehow (even if it gets implemented differently internally)? * Either way, this needs a cookbook example. * {{make_write_options}} is lacking a docstring * Writer options objects are lacking {{{}__repr__{}}}s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18320) [C++] Flight client may crash due to improper Result/Status conversion
David Li created ARROW-18320: Summary: [C++] Flight client may crash due to improper Result/Status conversion Key: ARROW-18320 URL: https://issues.apache.org/jira/browse/ARROW-18320 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 6.0.0 Reporter: David Li Reported on user@ https://lists.apache.org/thread/84z329t1djhnbr5bq936v4hr8cyngj2l {noformat} I have an issue on my project, we have a query execution engine that returns result data as a flight stream and c++ client that receives the stream. In case a query has no results but the result schema implies dictionary encoded fields in results we have client app crushed. The cause is in cpp/src/arrow/flight/client.cc:461: ::arrow::Result> ReadNextMessage() override { if (stream_finished_) { return nullptr; } internal::FlightData* data; { auto guard = read_mutex_ ? std::unique_lock(*read_mutex_) : std::unique_lock(); peekable_reader_->Next(); } if (!data) { stream_finished_ = true; return stream_->Finish(Status::OK()); // Here the issue } // Validate IPC message auto result = data->OpenMessage(); if (!result.ok()) { return stream_->Finish(std::move(result).status()); } *app_metadata_ = std::move(data->app_metadata); return result; } The method returns Result object while stream_Finish(..) returns a Status. So there is an implicit conversion from Status to Result that causes Result(Status) constructor to be called, but the constructor expects only error statuses which in turn causes the app to be failed: /// Constructs a Result object with the given non-OK Status object. All /// calls to ValueOrDie() on this object will abort. The given `status` must /// not be an OK status, otherwise this constructor will abort. /// /// This constructor is not declared explicit so that a function with a return /// type of `Result` can return a Status object, and the status will be /// implicitly converted to the appropriate return type as a matter of /// convenience. /// /// \param status The non-OK Status object to initialize to. Result(const Status& status) noexcept // NOLINT(runtime/explicit) : status_(status) { if (ARROW_PREDICT_FALSE(status.ok())) { internal::DieWithMessage(std::string("Constructed with a non-error status: ") + status.ToString()); } } Is there a way to workaround or fix it? We use Arrow 6.0.0, but it seems that the issue exists in all future versions. {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18229) [C++][Python] RecordBatchReader can be created with a 'dict' schema which then crashes on use
David Li created ARROW-18229: Summary: [C++][Python] RecordBatchReader can be created with a 'dict' schema which then crashes on use Key: ARROW-18229 URL: https://issues.apache.org/jira/browse/ARROW-18229 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 10.0.0 Reporter: David Li Presumably we should disallow this or convert it to a schema? https://github.com/duckdb/duckdb/issues/5143 {noformat} >>> import pyarrow as pa >>> pa.__version__ '10.0.0' >>> reader = pa.RecordBatchReader.from_batches({"a": pa.int8()}, []) >>> reader.schema fish: Job 1, 'python3' terminated by signal SIGSEGV (Address boundary error) (gdb) bt #0 0x74247580 in arrow::Schema::num_fields() const () from /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000 #1 0x742b93f7 in arrow::(anonymous namespace)::SchemaPrinter::Print() () from /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000 #2 0x742b98a7 in arrow::PrettyPrint(arrow::Schema const&, arrow::PrettyPrintOptions const&, std::string*) () from /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000 #3 0x764f814b in __pyx_pw_7pyarrow_3lib_6Schema_52to_string(_object*, _object*, _object*) () {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18191) [C++] Valgrind failure in arrow-gcsfs-test
David Li created ARROW-18191: Summary: [C++] Valgrind failure in arrow-gcsfs-test Key: ARROW-18191 URL: https://issues.apache.org/jira/browse/ARROW-18191 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li {noformat} ==11267== ==11267== HEAP SUMMARY: ==11267== in use at exit: 12,091 bytes in 190 blocks ==11267== total heap usage: 982,685 allocs, 982,495 frees, 1,332,264,705 bytes allocated ==11267== ==11267== 192 bytes in 8 blocks are definitely lost in loss record 35 of 45 ==11267==at 0x40377A5: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:542) ==11267==by 0x682B079: __cxa_thread_atexit (atexit_thread.cc:152) ==11267==by 0x672F2D6: google::cloud::v2_3_0::internal::OptionsSpan::OptionsSpan(google::cloud::v2_3_0::Options) (in /opt/conda/envs/arrow/lib/libgoogle_cloud_cpp_common.so.2.3.0) ==11267==by 0x5DFCA33: google::cloud::v2_3_0::Status google::cloud::storage::v2_3_0::Client::DeleteObject(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, google::cloud::storage::v2_3_0::Generation&&) (client.h:1285) ==11267==by 0x5DFD022: operator() (gcsfs.cc:550) ==11267==by 0x5DFD022: operator()&)>&, google::cloud::v2_3_0::StatusOr&> (future.h:150) ==11267==by 0x5DFD022: __invoke_impl&, arrow::fs::GcsFileSystem::Impl::DeleteDirContents(const arrow::fs::(anonymous namespace)::GcsPath&, bool, const arrow::io::IOContext&)::&)>&, google::cloud::v2_3_0::StatusOr&> (invoke.h:60) ==11267==by 0x5DFD022: __invoke&, arrow::fs::GcsFileSystem::Impl::DeleteDirContents(const arrow::fs::(anonymous namespace)::GcsPath&, bool, const arrow::io::IOContext&)::&)>&, google::cloud::v2_3_0::StatusOr&> (invoke.h:95) ==11267==by 0x5DFD022: __call (functional:416) ==11267==by 0x5DFD022: operator()<> (functional:499) ==11267==by 0x5DFD022: arrow::internal::FnOnce::FnImpl, arrow::fs::GcsFileSystem::Impl::DeleteDirContents(arrow::fs::(anonymous namespace)::GcsPath const&, bool, arrow::io::IOContext const&)::{lambda(google::cloud::v2_3_0::StatusOr const&)#1}, google::cloud::v2_3_0::StatusOr)> >::invoke() (functional.h:152) ==11267==by 0x50BDAA1: operator() (functional.h:140) ==11267==by 0x50BDAA1: arrow::internal::WorkerLoop(std::shared_ptr, std::_List_iterator) (thread_pool.cc:243) ==11267==by 0x50BE161: operator() (thread_pool.cc:414) ==11267==by 0x50BE161: __invoke_impl > (invoke.h:60) ==11267==by 0x50BE161: __invoke > (invoke.h:95) ==11267==by 0x50BE161: _M_invoke<0> (thread:264) ==11267==by 0x50BE161: operator() (thread:271) ==11267==by 0x50BE161: std::thread::_State_impl > >::_M_run() (thread:215) ==11267==by 0x6849A92: execute_native_thread_routine (thread.cc:82) ==11267==by 0x69666DA: start_thread (pthread_create.c:463) ==11267==by 0x6C9F61E: clone (clone.S:95) ==11267== { Memcheck:Leak match-leak-kinds: definite fun:_ZnwmRKSt9nothrow_t fun:execute_native_thread_routine fun:start_thread fun:clone } {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18060) [C++] Writing a dataset with 0 rows doesn't create any files
David Li created ARROW-18060: Summary: [C++] Writing a dataset with 0 rows doesn't create any files Key: ARROW-18060 URL: https://issues.apache.org/jira/browse/ARROW-18060 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 9.0.0 Reporter: David Li If the input data has no rows, no files get created. This is potentially unexpected as it looks like "nothing happened". It might be nicer to create an empty file. With partitioning, though, that then gets weird (there's no partition values) so maybe an error might make more sense instead. Reproduction in Python {code:python} import tempfile from pathlib import Path import pyarrow import pyarrow.dataset print("PyArrow version:", pyarrow.__version__) table = pyarrow.table([ [], ], schema=pyarrow.schema([ ("ints", "int64"), ])) with tempfile.TemporaryDirectory() as d: pyarrow.dataset.write_dataset(table, d, format="feather") print(list(Path(d).iterdir())) {code} Output {noformat} > python repro.py PyArrow version: 9.0.0 [] {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18035) [Java] Enable allocator logging in CI
David Li created ARROW-18035: Summary: [Java] Enable allocator logging in CI Key: ARROW-18035 URL: https://issues.apache.org/jira/browse/ARROW-18035 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li This would help track down certain flaky tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18034) [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch is flaky on Windows CI
David Li created ARROW-18034: Summary: [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch is flaky on Windows CI Key: ARROW-18034 URL: https://issues.apache.org/jira/browse/ARROW-18034 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li {noformat} java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (134217728) Allocator(ROOT) 0/134217728/270532608/9223372036854775807 (res/actual/peak/limit) at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437) at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29) at org.apache.arrow.flight.TestBasicOperation$Producer.close(TestBasicOperation.java:514) at org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:333) at org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:312) at org.apache.arrow.flight.TestBasicOperation.getStreamLargeBatch(TestBasicOperation.java:270) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17971) [Format][Docs] Add ADBC page
David Li created ARROW-17971: Summary: [Format][Docs] Add ADBC page Key: ARROW-17971 URL: https://issues.apache.org/jira/browse/ARROW-17971 Project: Apache Arrow Issue Type: New Feature Components: Documentation, Format Reporter: David Li Assignee: David Li See ML vote thread: https://lists.apache.org/thread/7gb8dooz554ykbk5wlrngzkgmq0qx7y0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17914) [Java] Support reading a subset of fields from an IPC file or stream
David Li created ARROW-17914: Summary: [Java] Support reading a subset of fields from an IPC file or stream Key: ARROW-17914 URL: https://issues.apache.org/jira/browse/ARROW-17914 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li C++ supports {{IpcReadOptions.included_fields}} which lets you load a subset of (top-level) fields from an IPC file or stream, potentially saving on I/O costs. It would be useful to support this in Java as well. Some refactoring would be required since MessageSerializer currently reads record batch messages in as a whole, and it would be good to quantify how much of a benefit this provides in different scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17867) [C++][FlightRPC] Expose bulk parameter binding in Flight SQL client
David Li created ARROW-17867: Summary: [C++][FlightRPC] Expose bulk parameter binding in Flight SQL client Key: ARROW-17867 URL: https://issues.apache.org/jira/browse/ARROW-17867 Project: Apache Arrow Issue Type: Improvement Reporter: David Li Assignee: David Li Also fix various issues noticed as part of ARROW-17661 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17857) [C++] Table::CombineChunksToBatch segfaults on empty tables
David Li created ARROW-17857: Summary: [C++] Table::CombineChunksToBatch segfaults on empty tables Key: ARROW-17857 URL: https://issues.apache.org/jira/browse/ARROW-17857 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Assignee: David Li There can be 0 chunks in a ChunkedArray -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17840) [Java] Remove flaky JaCoCo check in JDBC driver
David Li created ARROW-17840: Summary: [Java] Remove flaky JaCoCo check in JDBC driver Key: ARROW-17840 URL: https://issues.apache.org/jira/browse/ARROW-17840 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li It doesn't seem to bring much value + can make builds flaky (e.g. a branch may or may not be hit depending on when exactly an exception occurs) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17830) [C++][Gandiva] AppVeyor Windows builds failing due to 'diaguids.lib'
David Li created ARROW-17830: Summary: [C++][Gandiva] AppVeyor Windows builds failing due to 'diaguids.lib' Key: ARROW-17830 URL: https://issues.apache.org/jira/browse/ARROW-17830 Project: Apache Arrow Issue Type: Bug Components: C++, C++ - Gandiva Reporter: David Li Observed in AppVeyor across a few PRs {noformat} (arrow) C:\projects\arrow\cpp\build>cmake --build . --target install --config Release || exit /B ninja: error: 'C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/DIA SDK/lib/amd64/diaguids.lib', needed by 'release/gandiva.dll', missing and no known rule to make it {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17810) [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI
David Li created ARROW-17810: Summary: [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI Key: ARROW-17810 URL: https://issues.apache.org/jira/browse/ARROW-17810 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li Assignee: David Li Not sure why this didn't fail before, but we need to bump JaCoCo for Java 18 to work: {noformat} java.lang.instrument.IllegalClassFormatException: Error while instrumenting org/apache/calcite/avatica/AvaticaConnection$MockitoMock$854659140$auxiliary$kA4H37GT. at org.jacoco.agent.rt.internal_3570298.CoverageTransformer.transform(CoverageTransformer.java:94) at java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244) at java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188) at java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541) at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1013) at java.base/java.lang.ClassLoader$ByteBuddyAccessor$PXg8JwS3.defineClass(Unknown Source) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection$Dispatcher$UsingUnsafeInjection.defineClass(ClassInjector.java:1027) at net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection.injectRaw(ClassInjector.java:279) at net.bytebuddy.dynamic.loading.ClassInjector$AbstractBase.inject(ClassInjector.java:114) at net.bytebuddy.dynamic.loading.ClassLoadingStrategy$Default$InjectionDispatcher.load(ClassLoadingStrategy.java:233) at net.bytebuddy.dynamic.TypeResolutionStrategy$Passive.initialize(TypeResolutionStrategy.java:100) at net.bytebuddy.dynamic.DynamicType$Default$Unloaded.load(DynamicType.java:6154) at org.mockito.internal.creation.bytebuddy.SubclassBytecodeGenerator.mockClass(SubclassBytecodeGenerator.java:268) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40) at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.mockClass(InlineBytecodeGenerator.java:216) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40) at org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMockType(InlineDelegateByteBuddyMockMaker.java:391) at org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.doCreateMock(InlineDelegateByteBuddyMockMaker.java:351) at org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMock(InlineDelegateByteBuddyMockMaker.java:330) at org.mockito.internal.creation.bytebuddy.InlineByteBuddyMockMaker.createMock(InlineByteBuddyMockMaker.java:58) at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:53) at org.mockito.internal.MockitoCore.mock(MockitoCore.java:84) at org.mockito.Mockito.mock(Mockito.java:1964) at org.mockito.internal.configuration.MockAnnotationProcessor.processAnnotationForMock(MockAnnotationProcessor.java:66) at org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:27) at org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:24) at org.mockito.internal.configuration.IndependentAnnotationEngine.createMockFor(IndependentAnnotationEngine.java:45) at org.mockito.internal.configuration.IndependentAnnotationEngine.process(IndependentAnnotationEngine.java:73) at
[jira] [Created] (ARROW-17797) [Java] Remove deprecated methods from Java dataset module in Arrow 11
David Li created ARROW-17797: Summary: [Java] Remove deprecated methods from Java dataset module in Arrow 11 Key: ARROW-17797 URL: https://issues.apache.org/jira/browse/ARROW-17797 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li ARROW-15745 deprecated some things in the Dataset module which should be removed for Arrow >= 11 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17787) [Docs][Java] javadoc failing on flight-integration-tests
David Li created ARROW-17787: Summary: [Docs][Java] javadoc failing on flight-integration-tests Key: ARROW-17787 URL: https://issues.apache.org/jira/browse/ARROW-17787 Project: Apache Arrow Issue Type: Bug Components: Documentation, Java Reporter: David Li Assignee: David Li Observed on master {noformat} Loading source files for package org.apache.arrow.flight.integration.tests... Constructing Javadoc information... 1 error [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Arrow Java Root POM 10.0.0-SNAPSHOT . SUCCESS [07:35 min] [INFO] Arrow Format ... SUCCESS [ 26.940 s] [INFO] Arrow Memory ... SUCCESS [ 23.462 s] [INFO] Arrow Memory - Core SUCCESS [ 13.328 s] [INFO] Arrow Memory - Unsafe .. SUCCESS [ 14.376 s] [INFO] Arrow Memory - Netty ... SUCCESS [ 16.075 s] [INFO] Arrow Vectors .. SUCCESS [05:51 min] [INFO] Arrow Compression .. SUCCESS [ 36.824 s] [INFO] Arrow Tools SUCCESS [ 43.014 s] [INFO] Arrow JDBC Adapter . SUCCESS [ 40.846 s] [INFO] Arrow Plasma Client SUCCESS [ 26.950 s] [INFO] Arrow Flight ... SUCCESS [ 23.166 s] [INFO] Arrow Flight Core .. SUCCESS [02:01 min] [INFO] Arrow Flight GRPC .. SUCCESS [ 33.919 s] [INFO] Arrow Flight SQL ... SUCCESS [ 27.265 s] [INFO] Arrow Flight SQL JDBC Driver ... SKIPPED [INFO] Arrow Flight Integration Tests . FAILURE [ 16.021 s] [INFO] Arrow AVRO Adapter . SUCCESS [ 38.905 s] [INFO] Arrow Algorithms ... SUCCESS [ 30.490 s] [INFO] Arrow Performance Benchmarks 10.0.0-SNAPSHOT ... SUCCESS [ 43.648 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17:06 min (Wall Clock) [INFO] Finished at: 2022-09-20T16:26:54Z [INFO] Error: Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.5.1:site (default-site) on project flight-integration-tests: Error generating maven-javadoc-plugin:3.0.0-M1:test-javadoc: Error: Exit code: 1 - javadoc: error - No public or protected classes found to document. Error: Error: Command line was: /usr/lib/jvm/java-8-openjdk-amd64/jre/../bin/javadoc @options @packages Error: Error: Refer to the generated Javadoc files in '/arrow/java/flight/flight-integration-tests/target/site/testapidocs' dir. Error: -> [Help 1] Error: Error: To see the full stack trace of the errors, re-run Maven with the -e switch. Error: Re-run Maven using the -X switch to enable full debug logging. Error: Error: For more information about the errors and possible solutions, please read the following articles: Error: [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException Error: Error: After correcting the problems, you can resume the build with the command Error:mvn -rf :flight-integration-tests {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17785) [Java] Flakiness in JDBC driver test ArrowFlightJdbcConnectionCookieTest.testCookies
David Li created ARROW-17785: Summary: [Java] Flakiness in JDBC driver test ArrowFlightJdbcConnectionCookieTest.testCookies Key: ARROW-17785 URL: https://issues.apache.org/jira/browse/ARROW-17785 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: David Li Assignee: David Li I think we should just suppress this kind of exception in Flight SQL as it's not really actionable {noformat} Error: org.apache.arrow.driver.jdbc.ArrowFlightJdbcConnectionCookieTest.testCookies Time elapsed: 0.805 s <<< ERROR! java.sql.SQLException: While closing statement at org.apache.calcite.avatica.Helper.createException(Helper.java:56) at org.apache.calcite.avatica.Helper.createException(Helper.java:41) at org.apache.calcite.avatica.AvaticaStatement.close(AvaticaStatement.java:254) at org.apache.arrow.driver.jdbc.ArrowFlightJdbcConnectionCookieTest.testCookies(ArrowFlightJdbcConnectionCookieTest.java:51) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.rules.Verifier$1.evaluate(Verifier.java:35) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.apache.arrow.driver.jdbc.FlightServerTestRule$1.evaluate(FlightServerTestRule.java:166) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) at org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) at org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147) at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127) at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90) at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:55) at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:102) at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:54) at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114) Sep 19, 2022 12:52:16 AM io.grpc.netty.NettyServerHandler onStreamError WARNING: Stream Error io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:173) at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481) at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357) at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007) at
[jira] [Created] (ARROW-17741) [Packaging] Add JDBC driver to release tasks
David Li created ARROW-17741: Summary: [Packaging] Add JDBC driver to release tasks Key: ARROW-17741 URL: https://issues.apache.org/jira/browse/ARROW-17741 Project: Apache Arrow Issue Type: Sub-task Components: Packaging Reporter: David Li Assignee: David Li The java-jars task has a list of artifacts to upload, the JDBC driver needs to be included there: https://github.com/apache/arrow/blob/7cfdfbb0d5472f8f8893398b51042a3ca1dd0adf/dev/tasks/tasks.yml#L816-L820 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17732) [Docs][Java] Add documentation page for Flight SQL JDBC driver
David Li created ARROW-17732: Summary: [Docs][Java] Add documentation page for Flight SQL JDBC driver Key: ARROW-17732 URL: https://issues.apache.org/jira/browse/ARROW-17732 Project: Apache Arrow Issue Type: Sub-task Components: Documentation, Java Reporter: David Li Assignee: David Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17731) [Website] Add blog post about Flight SQL JDBC driver
David Li created ARROW-17731: Summary: [Website] Add blog post about Flight SQL JDBC driver Key: ARROW-17731 URL: https://issues.apache.org/jira/browse/ARROW-17731 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Website Reporter: David Li Assignee: David Li Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17729) [Java][FlightRPC] Flight SQL JDBC driver improvements
David Li created ARROW-17729: Summary: [Java][FlightRPC] Flight SQL JDBC driver improvements Key: ARROW-17729 URL: https://issues.apache.org/jira/browse/ARROW-17729 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Reporter: David Li Follow ups for ARROW-7744 * Rename internal classes to not imply everything is part of Flight RPC (e.g. ArrowFlightJdbcArray -> FieldVectorArray or similar) * Don't throw bare exceptions (always provide some error context) * Log a warning if the {{arrow-flight:}} URI scheme is used instead of {{arrow-flight-sql:}} * Create a documentation page (that can be used for people approaching this from the JDBC side, not necessarily Arrow users) * Replace {{// TODO}} comments with {{throw new UnsupportedOperationException()}} * Document how timestamp/time/date types are handled in converting between the two type schemas * Document the type conversions in general * [timestamp handling is suspect|https://github.com/apache/arrow/pull/13800#discussion_r938908230] * Upgrade to JUnit5/AssertJ instead of JUnit4/Hamcrest * Get rid of FreePortFinder -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17718) [C++][Java][FlightRPC] Get rid of FlightTestUtil.getStartedServer etc.
David Li created ARROW-17718: Summary: [C++][Java][FlightRPC] Get rid of FlightTestUtil.getStartedServer etc. Key: ARROW-17718 URL: https://issues.apache.org/jira/browse/ARROW-17718 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li Anything expecting to bind to a random port in CI is an antipattern and makes tests flaky. All tests should bind to port 0 and let the OS assign a port. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17688) [Format][FlightRPC][C++][Java] Add Substrait for Flight SQL
David Li created ARROW-17688: Summary: [Format][FlightRPC][C++][Java] Add Substrait for Flight SQL Key: ARROW-17688 URL: https://issues.apache.org/jira/browse/ARROW-17688 Project: Apache Arrow Issue Type: New Feature Components: C++, FlightRPC, Format, Java Reporter: David Li Assignee: David Li See ML: https://lists.apache.org/thread/3k3np6314dwb0n7n1hrfwony5fcy7kzl -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17675) [C++] Null pointer dereference in Substrait.BasicPlanRoundTripping
David Li created ARROW-17675: Summary: [C++] Null pointer dereference in Substrait.BasicPlanRoundTripping Key: ARROW-17675 URL: https://issues.apache.org/jira/browse/ARROW-17675 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li {noformat} [ RUN ] Substrait.BasicPlanRoundTripping file_path_str /tmp/substrait-tempdir-3pvz0v47/ /arrow/cpp/src/arrow/dataset/file_base.cc:97:19: runtime error: member call on null pointer of type 'arrow::Buffer' #0 0x7fba39909ef1 in arrow::dataset::FileSource::Equals(arrow::dataset::FileSource const&) const /arrow/cpp/src/arrow/dataset/file_base.cc:97:19 #1 0x7fba3990f1ed in arrow::dataset::FileFragment::Equals(arrow::dataset::FileFragment const&) const /arrow/cpp/src/arrow/dataset/file_base.cc:147:18 #2 0x76e22c in arrow::engine::Substrait_BasicPlanRoundTripping_Test::TestBody() /arrow/cpp/src/arrow/engine/substrait/serde_test.cc:1977:5 #3 0x7fba3c92fa9a in void testing::internal::HandleSehExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607:10 #4 0x7fba3c915759 in void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643:14 #5 0x7fba3c8ef652 in testing::Test::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2682:5 #6 0x7fba3c8f0418 in testing::TestInfo::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2861:11 #7 0x7fba3c8f0c33 in testing::TestSuite::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:3015:28 #8 0x7fba3c901a14 in testing::internal::UnitTestImpl::RunAllTests() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5855:44 #9 0x7fba3c93289a in bool testing::internal::HandleSehExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607:10 #10 0x7fba3c917f79 in bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643:14 #11 0x7fba3c901570 in testing::UnitTest::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5438:10 #12 0x7fba3c968210 in RUN_ALL_TESTS() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2490:46 #13 0x7fba3c9681ec in main /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:52:10 #14 0x7fba1b6f8082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) #15 0x4d4b2d in _start (/build/cpp/debug/arrow-substrait-substrait-test+0x4d4b2d) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /arrow/cpp/src/arrow/dataset/file_base.cc:97:19 in /build/cpp/src/arrow/engine {noformat} https://github.com/apache/arrow/runs/8274057341?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17661) [C++][Python][FlightRPC] Add Flight SQL ADBC driver and Python bindings
David Li created ARROW-17661: Summary: [C++][Python][FlightRPC] Add Flight SQL ADBC driver and Python bindings Key: ARROW-17661 URL: https://issues.apache.org/jira/browse/ARROW-17661 Project: Apache Arrow Issue Type: New Feature Components: C++, FlightRPC, Python Reporter: David Li Assignee: David Li Pending ADBC acceptance. This will finally make Flight SQL accessible in Python, though it will rely on having the ADBC driver manager available to provide the Python bindings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17645) [CI] conda-integration builds failing due to pinned zlib
David Li created ARROW-17645: Summary: [CI] conda-integration builds failing due to pinned zlib Key: ARROW-17645 URL: https://issues.apache.org/jira/browse/ARROW-17645 Project: Apache Arrow Issue Type: Bug Reporter: David Li {noformat} Encountered problems while solving: - package libsqlite-3.39.2-h753d276_1 requires libzlib >=1.2.12,<1.3.0a0, but none of the providers can be installed {noformat} but in ARROW-17410 we pinned zlib to 1.2.11 to avoid a zlib bug that was causing failures in JS tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17604) [Java][Docs] Improve docs around JVM flags
David Li created ARROW-17604: Summary: [Java][Docs] Improve docs around JVM flags Key: ARROW-17604 URL: https://issues.apache.org/jira/browse/ARROW-17604 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Java Reporter: David Li * Clarify where the {{--add-opens}} flag should be added (as an argument to {{java}}) * Demonstrate how to configure Surefire with it * Demonstrate how to configure IntelliJ with it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17603) [C++][FlightRPC] Print build logs if gRPC TlsCredentialsOptions detection fails
David Li created ARROW-17603: Summary: [C++][FlightRPC] Print build logs if gRPC TlsCredentialsOptions detection fails Key: ARROW-17603 URL: https://issues.apache.org/jira/browse/ARROW-17603 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Make it easier to debug build failures in CI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17568) [FlightRPC][Integration] Ensure all RPC methods are covered by integration testing
David Li created ARROW-17568: Summary: [FlightRPC][Integration] Ensure all RPC methods are covered by integration testing Key: ARROW-17568 URL: https://issues.apache.org/jira/browse/ARROW-17568 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC, Go, Integration, Java Reporter: David Li This would help catch issues like https://github.com/apache/arrow/issues/13853 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17558) [C++][FlightRPC] Inconsistent use of int, int32_t, uint32_t for SqlInfo enum values
David Li created ARROW-17558: Summary: [C++][FlightRPC] Inconsistent use of int, int32_t, uint32_t for SqlInfo enum values Key: ARROW-17558 URL: https://issues.apache.org/jira/browse/ARROW-17558 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li These should all be uint32_t, always. Not a big deal in practice at least. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17538) [C++] Importing an ArrowArrayStream can't handle errors from get_schema
David Li created ARROW-17538: Summary: [C++] Importing an ArrowArrayStream can't handle errors from get_schema Key: ARROW-17538 URL: https://issues.apache.org/jira/browse/ARROW-17538 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 9.0.0 Reporter: David Li As indicated in the code: https://github.com/apache/arrow/blob/cd3c6ead97d584366aafd2f14d99a1cb8ace9ca2/cpp/src/arrow/c/bridge.cc#L1823 This probably needs a static initializer so we can catch things. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17537) [Java][FlightRPC] Update benchmark to be on par with C++
David Li created ARROW-17537: Summary: [Java][FlightRPC] Update benchmark to be on par with C++ Key: ARROW-17537 URL: https://issues.apache.org/jira/browse/ARROW-17537 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Reporter: David Li See https://github.com/apache/arrow/issues/13980 The Java benchmark isn't comparable out of the box (and it seems like there's an unexplained gap between it and the C++ benchmark) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17433) [C++] AppVeyor build fails due to Boost/Flight
David Li created ARROW-17433: Summary: [C++] AppVeyor build fails due to Boost/Flight Key: ARROW-17433 URL: https://issues.apache.org/jira/browse/ARROW-17433 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li Observed on master {noformat} [182/351] Building CXX object src\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx.obj FAILED: src/arrow/filesystem/CMakeFiles/arrow-s3fs-test.dir/Unity/unity_0_cxx.cxx.obj C:\Miniconda37-x64\Scripts\clcache.exe /nologo /TP -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_MIMALLOC -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT -DAWS_IO_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=8 -DAWS_SDK_VERSION_PATCH=186 -DAWS_USE_IO_COMPLETION_PORTS -DBOOST_ALL_DYN_LINK -DBOOST_ALL_NO_LIB -DBOOST_ATOMIC_DYN_LINK -DBOOST_ATOMIC_NO_LIB -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_FILESYSTEM_NO_LIB -DBOOST_SYSTEM_DYN_LINK -DBOOST_SYSTEM_NO_LIB -DPROTOBUF_USE_DLLS -DURI_STATIC_BUILD -DUSE_IMPORT_EXPORT -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -IC:\projects\arrow\cpp\build\src -IC:\projects\arrow\cpp\src -IC:\projects\arrow\cpp\src\generated -IC:\projects\arrow\cpp\thirdparty\flatbuffers\include -IC:\Miniconda37-x64\envs\arrow\Library\include -IC:\projects\arrow\cpp\thirdparty\hadoop\include -IC:\projects\arrow\cpp\build\mimalloc_ep\src\mimalloc_ep\include\mimalloc-2.0 /DWIN32 /D_WINDOWS /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4800 /wd4996 /wd4065 /WX /MP /MD /Od /UNDEBUG /showIncludes /Fosrc\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx.obj /Fdsrc\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\ /FS -c C:\projects\arrow\cpp\build\src\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx Please define _WIN32_WINNT or _WIN32_WINDOWS appropriately. For example: - add -D_WIN32_WINNT=0x0601 to the compiler command line; or - add _WIN32_WINNT=0x0601 to your project's Preprocessor Definitions. Assuming _WIN32_WINNT=0x0601 (i.e. Windows 7 target). C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(266): error C2220: warning treated as error - no 'object' file generated C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(261): note: while compiling class template member function 'boost::iterators::transform_iterator>,Char **,boost::process::detail::entry>,boost::process::detail::entry>> boost::process::basic_environment_impl::find(const std::basic_string,std::allocator> &)' with [ Char=char ] C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(361): note: see reference to function template instantiation 'boost::iterators::transform_iterator>,Char **,boost::process::detail::entry>,boost::process::detail::entry>> boost::process::basic_environment_impl::find(const std::basic_string,std::allocator> &)' being compiled with [ Char=char ] C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(632): note: see reference to class template instantiation 'boost::process::basic_environment_impl' being compiled with [ Char=char ] C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/env.hpp(176): note: see reference to class template instantiation 'boost::process::basic_environment' being compiled C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/env.hpp(183): note: see reference to class template instantiation 'boost::process::detail::env_init' being compiled C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/relationship.hpp(595): note: see reference to class template instantiation 'boost::asio::execution::detail::relationship_t<0>' being compiled C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/outstanding_work.hpp(597): note: see reference to class template instantiation 'boost::asio::execution::detail::outstanding_work_t<0>' being compiled C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/occupancy.hpp(163): note: see reference to class template instantiation 'boost::asio::execution::detail::occupancy_t<0>' being compiled C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/mapping.hpp(764): note: see reference to class template instantiation 'boost::asio::execution::detail::mapping_t<0>' being
[jira] [Created] (ARROW-17420) [C++][FlightRPC] Flight SQL integration tests don't fully compare schema definitions
David Li created ARROW-17420: Summary: [C++][FlightRPC] Flight SQL integration tests don't fully compare schema definitions Key: ARROW-17420 URL: https://issues.apache.org/jira/browse/ARROW-17420 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Matt pointed this out in the Go tests: https://github.com/apache/arrow/pull/13868#discussion_r945827399 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17413) [JS] Integration test build fails with 'gulp-google-closure-compiler: java.util.zip.ZipException: invalid entry CRC'
David Li created ARROW-17413: Summary: [JS] Integration test build fails with 'gulp-google-closure-compiler: java.util.zip.ZipException: invalid entry CRC' Key: ARROW-17413 URL: https://issues.apache.org/jira/browse/ARROW-17413 Project: Apache Arrow Issue Type: Improvement Components: Integration, JavaScript Reporter: David Li Seen on master, some PRs {noformat} [07:42:29] Error: gulp-google-closure-compiler: java.util.zip.ZipException: invalid entry CRC (expected 0x4e1f14a4 but got 0xb1e0eb5b) at java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:410) at java.util.zip.ZipInputStream.read(ZipInputStream.java:199) at java.util.zip.ZipInputStream.closeEntry(ZipInputStream.java:143) at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:121) at com.google.javascript.jscomp.AbstractCommandLineRunner.getBuiltinExterns(AbstractCommandLineRunner.java:500) at com.google.javascript.jscomp.CommandLineRunner.createExterns(CommandLineRunner.java:2084) at com.google.javascript.jscomp.AbstractCommandLineRunner.doRun(AbstractCommandLineRunner.java:1187) at com.google.javascript.jscomp.AbstractCommandLineRunner.run(AbstractCommandLineRunner.java:551) at com.google.javascript.jscomp.CommandLineRunner.main(CommandLineRunner.java:2246) Error writing to stdin of the compiler. write EPIPE CustomError: gulp-google-closure-compiler: Compilation errors occurred at CompilationStream._compilationComplete (/arrow/js/node_modules/google-closure-compiler/lib/gulp/index.js:238:28) at /arrow/js/node_modules/google-closure-compiler/lib/gulp/index.js:208:14 at formatError (/arrow/js/node_modules/gulp-cli/lib/versioned/^4.0.0/format-error.js:21:10) at Gulp. (/arrow/js/node_modules/gulp-cli/lib/versioned/^4.0.0/log/events.js:33:15) at Gulp.emit (node:events:538:35) at Gulp.emit (node:domain:475:12) at Object.error (/arrow/js/node_modules/undertaker/lib/helpers/createExtensions.js:61:10) at handler (/arrow/js/node_modules/now-and-later/lib/mapSeries.js:47:14) at f (/arrow/js/node_modules/once/once.js:25:25) at f (/arrow/js/node_modules/once/once.js:25:25) at tryCatch (/arrow/js/node_modules/bach/node_modules/async-done/index.js:24:15) at done (/arrow/js/node_modules/bach/node_modules/async-done/index.js:40:12) [07:42:29] 'build:es2015:umd' errored after 3.02 min {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17385) [Integration] Re-enable disabled Rust Flight middleware test
David Li created ARROW-17385: Summary: [Integration] Re-enable disabled Rust Flight middleware test Key: ARROW-17385 URL: https://issues.apache.org/jira/browse/ARROW-17385 Project: Apache Arrow Issue Type: Improvement Components: Integration Reporter: David Li Assignee: David Li Follow-up for ARROW-10961. The linked Rust issue was fixed, so we should re-enable the integration test case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17342) [Java] Improve testing of Dataset bindings
David Li created ARROW-17342: Summary: [Java] Improve testing of Dataset bindings Key: ARROW-17342 URL: https://issues.apache.org/jira/browse/ARROW-17342 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li >From https://github.com/apache/arrow/pull/13811 * We should ensure all types are tested * We should organize tests in a way that Parquet, IPC, and eventually CSV/ORC can mostly share test code (save for perhaps skipping/overriding specific format-type pairs) Incidentally: it may be good to incrementally port this module to JUnit5 and drop JUnit4 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17307) [C++][FlightRPC] Fix linking of Flight/gRPC example on MacOS
David Li created ARROW-17307: Summary: [C++][FlightRPC] Fix linking of Flight/gRPC example on MacOS Key: ARROW-17307 URL: https://issues.apache.org/jira/browse/ARROW-17307 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li {{flight_grpc_example}} uses {{--no-as-needed}} but this doesn't work on MacOS. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17300) [Java][Docs] Compare/contrast the Netty and Unsafe memory backends
David Li created ARROW-17300: Summary: [Java][Docs] Compare/contrast the Netty and Unsafe memory backends Key: ARROW-17300 URL: https://issues.apache.org/jira/browse/ARROW-17300 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Java Reporter: David Li We should compare why you might want to use each. Are there benchmarks in the Java benchmark suite that might also be useful? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17270) [Docs] Move Java nightlies instructions to developer docs to comply with ASF policies
David Li created ARROW-17270: Summary: [Docs] Move Java nightlies instructions to developer docs to comply with ASF policies Key: ARROW-17270 URL: https://issues.apache.org/jira/browse/ARROW-17270 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Java Reporter: David Li https://github.com/apache/arrow/pull/13755#pullrequestreview-1056673168 {quote} BTW, can we move the "Installing Nightly Packages" section to development documents (in a follow-up task)? It seems that this doesn't follow the ASF policy (It seems that "Use them at your own risk" isn't suitable for the ASF policy): https://www.apache.org/legal/release-policy.html#publication Projects SHALL publish official releases and SHALL NOT publish unreleased materials outside the development community. During the process of developing software and preparing a release, various packages are made available to the development community for testing purposes. Projects MUST direct outsiders towards official releases rather than raw source repositories, nightly builds, snapshots, release candidates, or any other similar packages. Projects SHOULD make available developer resources to support individuals actively participating in development or following the dev list and thus aware of the conditions placed on unreleased materials. {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17268) [C++] JSON kernels
David Li created ARROW-17268: Summary: [C++] JSON kernels Key: ARROW-17268 URL: https://issues.apache.org/jira/browse/ARROW-17268 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li As discussed on dev@: https://lists.apache.org/thread/onzgogx2c2djxs0wbhmvqp2dbx7kjf6o "[ARROW-17255] Logical JSON type in Arrow" It would be interesting to have JSON parsing/serializing compute functions that operate on columns of (stringified) JSON records. For parsing, the problem is we need to know the output schema without being able to look at the data, so we would probably only be able to decode into a {{map[string, union]}} type at best. And/or we could offer "extraction" functions akin to what things like SQLite and Postgres provide (at the cost of having to reparse the JSON over and over). Also see ARROW-17255 for a logical JSON type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17254) [C++][FlightRPC] Flight SQL server does not implement GetSchema
David Li created ARROW-17254: Summary: [C++][FlightRPC] Flight SQL server does not implement GetSchema Key: ARROW-17254 URL: https://issues.apache.org/jira/browse/ARROW-17254 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li This is specified, but not actually implemented! It needs to be covered in integration tests, too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17242) [C++][FlightRPC] Implement and call FlightDataStream::Close()
David Li created ARROW-17242: Summary: [C++][FlightRPC] Implement and call FlightDataStream::Close() Key: ARROW-17242 URL: https://issues.apache.org/jira/browse/ARROW-17242 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Assignee: David Li For RecordBatchStream, this should dispatch to the underlying RecordBatchReader::Close. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17230) [C++] Fix minor bugs in Substrait to ExecPlan conversion
David Li created ARROW-17230: Summary: [C++] Fix minor bugs in Substrait to ExecPlan conversion Key: ARROW-17230 URL: https://issues.apache.org/jira/browse/ARROW-17230 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li Assignee: David Li * The return type of DeserializePlan is wrong: it should be {{shared_ptr}}, else we get a use-after-free. * Errors are ignored where they shouldn't be: you can get a half-constructed plan instead of an error. * A stateful callback is called twice, leading to invalid options being passed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17229) [C++] ReadRel is translated to a source node that emits unexpected fields
David Li created ARROW-17229: Summary: [C++] ReadRel is translated to a source node that emits unexpected fields Key: ARROW-17229 URL: https://issues.apache.org/jira/browse/ARROW-17229 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Currently, a Substrait plan with a RelRoot containing a ReadRel will contain extra, unexpected fields, namely {{__fragment_index}} et. al. Right now they are always included by default. There are a few things to be done: * ReadRel's {{base_schema}} could be converted into a {{ScanOptions.dataset_schema}} to limit the fields read. (Also see ARROW-15585, these fields should be used for pushdown projection) * The scanner always adds these extra fields - maybe it should be opt-in instead * There's no way to manually insert a Project to "fix" things because as implemented, it can only add new columns -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17214) [C++] Implement Scalar CastTo from all types to String
David Li created ARROW-17214: Summary: [C++] Implement Scalar CastTo from all types to String Key: ARROW-17214 URL: https://issues.apache.org/jira/browse/ARROW-17214 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li As reported on the mailing list: https://lists.apache.org/thread/rp7vpjtt4lgtjxj35oyjyqh9b6on94jf Some types, including LIST, LARGE_LIST, and MAP do not implement casts. Ideally we'd implement these (implement all to-string casts?) by leveraging the existing cast for any formattable type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17199) [FlightRPC][Java] Fix example Flight SQL server
David Li created ARROW-17199: Summary: [FlightRPC][Java] Fix example Flight SQL server Key: ARROW-17199 URL: https://issues.apache.org/jira/browse/ARROW-17199 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li Assignee: David Li There are a number of small bugs in the Java Flight SQL example (e.g. binding parameters to the wrong index, not handling null parameter values, not properly reporting errors) that should be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17191) [C++] MinGW Flight tests failing
David Li created ARROW-17191: Summary: [C++] MinGW Flight tests failing Key: ARROW-17191 URL: https://issues.apache.org/jira/browse/ARROW-17191 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Assignee: David Li Noticed across several PRs {noformat} [ RUN ] GrpcDataTest.TestDoExchangeError D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:490: Failure Value of: _st.IsNotImplemented() Actual: false Expected: true Expected 'writer->Close()' to fail with NotImplemented, but got IOError: Stream finished before first message sent. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52323 {created_time:"2022-07-23T01:21:23.785644223+00:00", grpc_status:2, grpc_message:"Stream finished before first message sent"}. Client context: OK. Detail: Failed D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:490: Failure Value of: _st.ToString() Expected: has substring "Expected error" Actual: "IOError: Stream finished before first message sent. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52323 {created_time:\"2022-07-23T01:21:23.785644223+00:00\", grpc_status:2, grpc_message:\"Stream finished before first message sent\"}. Client context: OK. Detail: Failed" [ FAILED ] GrpcDataTest.TestDoExchangeError (5 ms) [ RUN ] GrpcDataTest.TestDoExchangeConcurrency [ OK ] GrpcDataTest.TestDoExchangeConcurrency (5 ms) [ RUN ] GrpcDataTest.TestDoExchangeUndrained [ OK ] GrpcDataTest.TestDoExchangeUndrained (4 ms) [ RUN ] GrpcDataTest.TestIssue5095 [ OK ] GrpcDataTest.TestIssue5095 (9 ms) [--] 17 tests from GrpcDataTest (891 ms total) [--] 7 tests from GrpcDoPutTest [ RUN ] GrpcDoPutTest.TestInts D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure Failed 'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but got \0L��. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52331 {grpc_message:"Expected app_metadata to be foo bar but got \x00L\xf4\x86\x02\xe0\xa1", grpc_status:3, created_time:"2022-07-23T01:21:23.810734286+00:00"}. Client context: OK [ FAILED ] GrpcDoPutTest.TestInts (4 ms) [ RUN ] GrpcDoPutTest.TestFloats D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure Failed 'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but got \0<���. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52333 {grpc_message:"Expected app_metadata to be foo bar but got \x00<\xee\xc6\x02\xe0\xa1", grpc_status:3, created_time:"2022-07-23T01:21:23.815439591+00:00"}. Client context: OK [ FAILED ] GrpcDoPutTest.TestFloats (4 ms) [ RUN ] GrpcDoPutTest.TestEmptyBatch D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure Failed 'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but got \0���. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52335 {grpc_message:"Expected app_metadata to be foo bar but got \x00\x9c\xef\xa6\x02\xe0\xa1", grpc_status:3, created_time:"2022-07-23T01:21:23.819872813+00:00"}. Client context: OK [ FAILED ] GrpcDoPutTest.TestEmptyBatch (4 ms) [ RUN ] GrpcDoPutTest.TestDicts D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure Failed 'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but got \0\���. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52337 {grpc_message:"Expected app_metadata to be foo bar but got \x00\\\xf0\xc6\x02\xe0\xa1", grpc_status:3, created_time:"2022-07-23T01:21:23.824172893+00:00"}. Client context: OK [ FAILED ] GrpcDoPutTest.TestDicts (4 ms) [ RUN ] GrpcDoPutTest.TestLargeBatch D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure Failed 'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but got \0|��. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52339 {created_time:"2022-07-23T01:21:24.001437714+00:00", grpc_status:3, grpc_message:"Expected app_metadata to be foo bar but got \x00|\xf2\xa6\x02\xe0\xa1"}. Client context: OK [ FAILED ] GrpcDoPutTest.TestLargeBatch (185 ms) [ RUN ] GrpcDoPutTest.TestSizeLimit D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:802: Failure Failed 'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but got \0\�,�. gRPC client debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52341 {grpc_message:"Expected app_metadata to be foo bar but got \x00\\\xef,\x07\xe0\xa1", grpc_status:3, created_time:"2022-07-23T01:21:24.016917836+00:00"}. Client context: OK [ FAILED ] GrpcDoPutTest.TestSizeLimit (8 ms) [ RUN ]
[jira] [Created] (ARROW-17163) [C++] Don't install jni_util.h
David Li created ARROW-17163: Summary: [C++] Don't install jni_util.h Key: ARROW-17163 URL: https://issues.apache.org/jira/browse/ARROW-17163 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li ARROW-17086 fixed some compiler warnings and restored the installation of jni_util.h to match prior behavior. But we never intended to expose this header, and the downstream Gluten project [no longer depends on it|https://github.com/apache/arrow/pull/13614#issuecomment-1191198106], so we can stop installing it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17113) [Java] All static initializers should catch and report exceptions
David Li created ARROW-17113: Summary: [Java] All static initializers should catch and report exceptions Key: ARROW-17113 URL: https://issues.apache.org/jira/browse/ARROW-17113 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li As reported on the mailing list: https://lists.apache.org/thread/gysn25gsm4v1fvvx9l0sjyr627xy7q65 All static initializers should catch and report exceptions, or else they will get swallowed by the JVM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17109) [C++] Clean up includes in exec_plan.h
David Li created ARROW-17109: Summary: [C++] Clean up includes in exec_plan.h Key: ARROW-17109 URL: https://issues.apache.org/jira/browse/ARROW-17109 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Notably, it includes logging.h transitively via exec/util.h which we should avoid. We should perhaps add to/create an arrow/compute/exec/type_fwd.h -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17105) [Python] test_filesystem_dataset_no_filesystem_interaction segfault on s390x
David Li created ARROW-17105: Summary: [Python] test_filesystem_dataset_no_filesystem_interaction segfault on s390x Key: ARROW-17105 URL: https://issues.apache.org/jira/browse/ARROW-17105 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: David Li Python on s390x test failed: {noformat} usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_filesystem_dataset_no_filesystem_interaction[threaded] Fatal Python error: Segmentation fault Thread 0x03ff954f3700 (most recent call first): PASSED [ 23%] File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 60 in _multicall File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.8/ usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_filesystem_dataset_no_filesystem_interaction[serial] dist-packages/pluggy/_hooks.py", line 265 in __call__ File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.pyPASSED [ 23%]", line 18 in _multicall File "/usr/loc usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_dataset[threaded] al/lib/python3.SKIPPED [ 23%] usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_dataset[serial] SKIPPED [ 23%] usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_scanner[threaded] 8/dist-packages/pluggy/_callers.py", line 33 in _multicall File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 223 in call_and_report File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 137 in runtestprotocol File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 60 in _multicall File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__ File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 347 in pytest_runtestloop File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__ File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 322 in _main File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 268 in wrap_session File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 315 in pytest_cmdline_main File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__ File "/usr/local/lib/python3.8/dist-packages/_pytest/config/__init__.py", line 164 in main File "/usr/local/lib/python3.8/dist-packages/_pytest/config/__init__.py", line 187 in console_main File "/usr/local/bin/pytest", line 8 in /arrow/ci/scripts/python_test.sh: line 57: 10190 Segmentation fault (core dumped) pytest -r s -v ${PYTEST_ARGS} --pyargs pyarrow 139 Error: `docker-compose --file /home/travis/build/apache/arrow/docker-compose.yml run --rm -e ARROW_BUILD_STATIC=OFF -e ARROW_FLIGHT=ON -e ARROW_GCS=OFF -e ARROW_MIMALLOC=OFF -e ARROW_ORC=OFF -e ARROW_PARQUET=OFF -e ARROW_PYTHON=ON -e ARROW_S3=OFF -e CMAKE_BUILD_PARALLEL_LEVEL=2 -e CMAKE_UNITY_BUILD=ON -e PARQUET_BUILD_EXAMPLES=OFF -e PARQUET_BUILD_EXECUTABLES=OFF -e Protobuf_SOURCE=BUNDLED -e gRPC_SOURCE=BUNDLED --volume /home/travis/build/apache/arrow/build:/build ubuntu-python` exited with a non-zero exit code 139, see the process log above. {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17093) [C++][CI] Enable libSegFault for C++ tests
David Li created ARROW-17093: Summary: [C++][CI] Enable libSegFault for C++ tests Key: ARROW-17093 URL: https://issues.apache.org/jira/browse/ARROW-17093 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Reporter: David Li Adding libSegFault.so could make it easier to diagnose CI failures. It will print a backtrace on segfault. {noformat} env SEGFAULT_SIGNALS=all \ LD_PRELOAD=/lib/x86_64-linux-gnu/libSegFault.so {noformat} This will give a backtrace like this on segfault: {noformat} Backtrace: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f8f4a0b900b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f8f4a098859] /lib/x86_64-linux-gnu/libc.so.6(+0x8d26e)[0x7f8f4a10326e] /lib/x86_64-linux-gnu/libc.so.6(+0x952fc)[0x7f8f4a10b2fc] /lib/x86_64-linux-gnu/libc.so.6(+0x96f6d)[0x7f8f4a10cf6d] /tmp/arrow-HEAD.y8UwB/cpp-build/release/flight-test-integration-client(_ZNSt8_Rb_treeISt10shared_ptrIN5arrow8DataTypeEES3_St9_IdentityIS3_ESt4lessIS3_ESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E+0x39)[0x5557a9a83b19] /tmp/arrow-HEAD.y8UwB/cpp-build/release/flight-test-integration-client(_ZNSt8_Rb_treeISt10shared_ptrIN5arrow8DataTypeEES3_St9_IdentityIS3_ESt4lessIS3_ESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E+0x1f)[0x5557a9a83aff] /tmp/arrow-HEAD.y8UwB/cpp-build/release/flight-test-integration-client(_ZNSt3setISt10shared_ptrIN5arrow8DataTypeEESt4lessIS3_ESaIS3_EED1Ev+0x33)[0x5557a9a83b83] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xce)[0x7f8f4a0bcfde] /tmp/arrow-HEAD.y8UwB/cpp-build/release/libarrow.so.900(+0x440b67)[0x7f8f47d56b67] {noformat} Caveats: * The path is OS-specific * We could integrate it into the build tooling instead of doing it via env var * Are there easily accessible equivalents for MacOS and Windows we could use? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17052) [C++][Python][FlightRPC] Ensure ::Serialize and ::Deserialize are consistently implemented
David Li created ARROW-17052: Summary: [C++][Python][FlightRPC] Ensure ::Serialize and ::Deserialize are consistently implemented Key: ARROW-17052 URL: https://issues.apache.org/jira/browse/ARROW-17052 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC, Python Reporter: David Li Structures like Action don't expose these methods even though ones like FlightInfo do. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17025) [Dev] Merge script could warn if username pings would be present in commit message
David Li created ARROW-17025: Summary: [Dev] Merge script could warn if username pings would be present in commit message Key: ARROW-17025 URL: https://issues.apache.org/jira/browse/ARROW-17025 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: David Li If a PR gets merged and its description {{@}} references a user, then the user will get a GitHub notification every time that commit gets pushed to a fork. This can be rather a bother, so it might be nice if the merge script could warn about this, or possibly even rewrite the commit message. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17024) [Java] Ensure Flight with native Netty transport is actually being tested
David Li created ARROW-17024: Summary: [Java] Ensure Flight with native Netty transport is actually being tested Key: ARROW-17024 URL: https://issues.apache.org/jira/browse/ARROW-17024 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Reporter: David Li Assignee: David Li There's only one test that exercises the domain socket path and it appears it's getting skipped on CI {noformat} [INFO] Running org.apache.arrow.flight.TestServerOptions Warning: Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.024 s - in org.apache.arrow.flight.TestServerOptions {noformat} We should make sure this test works and figure out whatever Maven magic we need to get the right dependencies on the right platforms -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17006) [Java] arrow-jdbc defines type but not value mapping for struct types
David Li created ARROW-17006: Summary: [Java] arrow-jdbc defines type but not value mapping for struct types Key: ARROW-17006 URL: https://issues.apache.org/jira/browse/ARROW-17006 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li While Types.STRUCT is mapped to ArrowType.Struct, we need additional config to be able to actually read such values, similar to ARROW-4142. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17004) [Java] Implement Arrow->JDBC prepared statement parameters for arrow-jdbc
David Li created ARROW-17004: Summary: [Java] Implement Arrow->JDBC prepared statement parameters for arrow-jdbc Key: ARROW-17004 URL: https://issues.apache.org/jira/browse/ARROW-17004 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li Assignee: David Li arrow-jdbc can turn JDBC ResultSets into Arrow VectorSchemaRoots. However, it would also be useful to have the opposite: bind values from a VectorSchemaRoot to a PreparedStatement for inserting/updating data. This is necessary for the ADBC project but isn't ADBC specific, so it could be added to arrow-jdbc. We should also document the type mapping it uses and how to customize the mapping. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17003) [Java][Docs] Document JDBC module
David Li created ARROW-17003: Summary: [Java][Docs] Document JDBC module Key: ARROW-17003 URL: https://issues.apache.org/jira/browse/ARROW-17003 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Java Reporter: David Li Assignee: David Li The arrow-jdbc submodule could use its own documentation page. In particular, we should document the type mapping it uses (and the rationale where applicable) and how to customize it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16994) [Docs][CI] Clean up some docs warnings and increase CI timeout
David Li created ARROW-16994: Summary: [Docs][CI] Clean up some docs warnings and increase CI timeout Key: ARROW-16994 URL: https://issues.apache.org/jira/browse/ARROW-16994 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Documentation Reporter: David Li The docs are starting to take just about 30 minutes to build, causing spurious timeouts. Also, there are several warnings that could/should be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16958) [C++][FlightRPC] Flight generates misaligned buffers
David Li created ARROW-16958: Summary: [C++][FlightRPC] Flight generates misaligned buffers Key: ARROW-16958 URL: https://issues.apache.org/jira/browse/ARROW-16958 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li Protobuf's wire format design + our zero-copy serializer/deserializer mean that buffers can end up misaligned. On some Arrow versions, this can cause segfaults in kernels assuming alignment (and generally violates expectations). We should: * Possibly include buffer alignment in array validation * See if we can adjust the serializer to somehow pad things properly * See if we can do anything about this in the deserializer Example: {code:python} import pyarrow as pa import pyarrow.flight as flight class TestServer(flight.FlightServerBase): def do_get(self, context, ticket): schema = pa.schema( [ ("index", pa.int64()), ("int8", pa.float64()), ("int16", pa.float64()), ("int32", pa.float64()), ] ) return flight.RecordBatchStream(pa.table([ [0, 1, 2, 3], [0, 1, None, 3], [0, 1, 2, None], [0, None, 2, 3], ], schema=schema)) with TestServer() as server: client = flight.connect(f"grpc://localhost:{server.port}") table = client.do_get(flight.Ticket(b"")).read_all() for col in table: print(col.type) for chunk in col.chunks: for buf in chunk.buffers(): if not buf: continue print("buffer is 8-byte aligned?", buf.address % 8) chunk.cast(pa.float32()) {code} On Arrow 8 {noformat} int64 buffer is 8-byte aligned? 1 double buffer is 8-byte aligned? 1 buffer is 8-byte aligned? 1 double buffer is 8-byte aligned? 1 buffer is 8-byte aligned? 1 double buffer is 8-byte aligned? 1 buffer is 8-byte aligned? 1 {noformat} On Arrow 7 {noformat} int64 buffer is 8-byte aligned? 4 double buffer is 8-byte aligned? 4 buffer is 8-byte aligned? 4 fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address boundary error) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16944) [C++] Create macro-benchmarks of file format readers
David Li created ARROW-16944: Summary: [C++] Create macro-benchmarks of file format readers Key: ARROW-16944 URL: https://issues.apache.org/jira/browse/ARROW-16944 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: David Li Currently we have (some) microbenchmarks, but measuring performance of our various readers (CSV, JSON, IPC, Parquet, ORC) over "real world" files would also be interesting and hopefully more illustrative of the use cases we actually care about. Such benchmarks may be expensive, though. Ideally, we would do this in a variety of scenarios: in-memory (to focus on CPU optimization), on-disk (though such measurements would likely be extremely noisy?), and over the network (perhaps with something like Minio + Toxiproxy to try to have a consistent, reproducible setup) so that we can also judge the I/O characteristics of the readers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16913) [Java] Implement ArrowArrayStream/C Stream Interface
David Li created ARROW-16913: Summary: [Java] Implement ArrowArrayStream/C Stream Interface Key: ARROW-16913 URL: https://issues.apache.org/jira/browse/ARROW-16913 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: David Li ARROW-12965 implemented the core C Data Interface, but we still need to implement the streaming interface. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16902) [C++] Flight SQL fails to build on Windows due to incorrect usage of DLL linkage specifiers
David Li created ARROW-16902: Summary: [C++] Flight SQL fails to build on Windows due to incorrect usage of DLL linkage specifiers Key: ARROW-16902 URL: https://issues.apache.org/jira/browse/ARROW-16902 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 8.0.0 Reporter: David Li Assignee: David Li Fix For: 9.0.0 Flight SQL uses "ARROW_EXPORT" in places, and also fails to define "ARROW_FLIGHT_EXPORTING", leading to linker issues. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16877) [C++] Valgrind failure (unintialized value) in arrow-compute-internals-test
David Li created ARROW-16877: Summary: [C++] Valgrind failure (unintialized value) in arrow-compute-internals-test Key: ARROW-16877 URL: https://issues.apache.org/jira/browse/ARROW-16877 Project: Apache Arrow Issue Type: Improvement Reporter: David Li Looks like GTest is trying to print an uninitalized unique_ptr. https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=27986=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=d9b15392-e4ce-5e4c-0c8c-b69645229181 {noformat} 27/68 Test #28: arrow-compute-internals-test .***Failed 15.30 sec ==11317== Memcheck, a memory error detector ==11317== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==11317== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==11317==by 0x1C31BF: void testing::internal::PrintTupleTo > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >, 2ul>(std::tuple > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::integral_constant, std::ostream*) (gtest-printers.h:641) ==11317==by 0x1C31F8: void testing::internal::PrintTupleTo > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >, 3ul>(std::tuple > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::integral_constant, std::ostream*) (gtest-printers.h:641) ==11317==by 0x1C3231: void testing::internal::PrintTupleTo > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >, 4ul>(std::tuple > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::integral_constant, std::ostream*) (gtest-printers.h:641) ==11317==by 0x1C3285: void testing::internal::PrintTo > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >(std::tuple > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::ostream*) (gtest-printers.h:654) ==11317==by 0x1C32AA: Print (gtest-printers.h:691) ==11317==by 0x1C32AA: void testing::internal::UniversalPrint > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > >(std::tuple > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::ostream*) (gtest-printers.h:980) ==11317==by 0x1C32E7: Print (gtest-printers.h:865) ==11317==by 0x1C32E7: std::__cxx11::basic_string, std::allocator > testing::PrintToString > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > >(std::tuple > ()>, std::function, std::function, std::allocator >, std::allocator, std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&) (gtest-printers.h:1018) ==11317==by 0x1C4033: testing::internal::ParameterizedTestSuiteInfo::RegisterTests() (gtest-param-util.h:590) ==11317==by 0x6438DBC: testing::internal::ParameterizedTestSuiteRegistry::RegisterTests() (gtest-param-util.h:726) ==11317==by 0x6445597: testing::internal::UnitTestImpl::RegisterParameterizedTests() (gtest.cc:2823) ==11317==by 0x64558D3: testing::internal::UnitTestImpl::PostFlagParsingInit() (gtest.cc:5639) ==11317==by 0x646C550: void testing::internal::InitGoogleTestImpl(int*, char**) (gtest.cc:6646) ==11317==by 0x64584C4: testing::InitGoogleTest(int*, char**) (gtest.cc:6664) ==11317==by 0x4205956: main (gtest_main.cc:51) ==11317== { Memcheck:Cond fun:vfprintf fun:vsnprintf fun:snprintf fun:_ZN7testing12_GLOBAL__N_126PrintByteSegmentInObjectToEPKhmmPSo fun:_ZN7testing12_GLOBAL__N_124PrintBytesInObjectToImplEPKhmPSo fun:_ZN7testing8internal20PrintBytesInObjectToEPKhmPSo fun:PrintValue()> > fun:_ZN7testing8internal17PrintWithFallbackISt8functionIFSt10unique_ptrIN5arrow7compute16FunctionRegistryESt14default_deleteIS6_EEvvRKT_PSo fun:_ZN7testing8internal7PrintToISt8functionIFSt10unique_ptrIN5arrow7compute16FunctionRegistryESt14default_deleteIS6_EEvvRKT_PSo fun:Print
[jira] [Created] (ARROW-16873) [Python] test_debug_memory_pool_disabled segfaulting on MacOS CI
David Li created ARROW-16873: Summary: [Python] test_debug_memory_pool_disabled segfaulting on MacOS CI Key: ARROW-16873 URL: https://issues.apache.org/jira/browse/ARROW-16873 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: David Li Observed on master and many PRs, example: https://github.com/apache/arrow/runs/6991997196?check_suite_focus=true >From a quick read, it's likely just that the stderr isn't necessarily empty as >the test expects. {noformat} === FAILURES === _ test_debug_memory_pool_disabled[system_memory_pool] __ pool_factory = @pytest.mark.parametrize('pool_factory', supported_factories()) def test_debug_memory_pool_disabled(pool_factory): res = run_debug_memory_pool(pool_factory.__name__, "") # The subprocess either returned successfully or was killed by a signal # (due to writing out of bounds), depending on the underlying allocator. if os.name == "posix": assert res.returncode <= 0 else: res.check_returncode() > assert res.stderr == "" E assert 'Fatal Python...in \n' == '' E + Fatal Python error: Segmentation fault E + E + Current thread 0x000102009e00 (most recent call first): E + File "", line 12 in /usr/local/lib/python3.9/site-packages/pyarrow/tests/test_memory.py:245: AssertionError {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16836) [C++] Have exported ArrowArrayStreams call RecordBatchReader::Close
David Li created ARROW-16836: Summary: [C++] Have exported ArrowArrayStreams call RecordBatchReader::Close Key: ARROW-16836 URL: https://issues.apache.org/jira/browse/ARROW-16836 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li We added RecordBatchReader::Close(), should we have an exported ArrowArrayStream call this? The issue is that {{release()}} can't return errors. We could call {{Close()}} implicitly after the last batch if the user drains the ArrowArrayStream, and return any error there, but if they don't drain the stream (but call {{release}}) we'll have no way to return the error. (Or we could make an ABI break…) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16788) [C++] Some packing builds fail to build bundled gRPC
David Li created ARROW-16788: Summary: [C++] Some packing builds fail to build bundled gRPC Key: ARROW-16788 URL: https://issues.apache.org/jira/browse/ARROW-16788 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li [https://github.com/ursacomputing/crossbow/runs/6789534725?check_suite_focus=true] {noformat} FAILED: CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o /usr/lib/ccache/c++ -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/include -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/third_party/address_sorting/include -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/upb-generated -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/upbdefs-generated -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/third_party/upb -I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/third_party/xxhash -Igens -isystem /build/apache-arrow-9.0.0.dev191/cpp_build/protobuf_ep-install/include -isystem /build/apache-arrow-9.0.0.dev191/cpp_build/absl_ep-install/include -g -O2 -fdebug-prefix-map=/build/apache-arrow-9.0.0.dev191=. -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fdiagnostics-color=always -O3 -DNDEBUG -O3 -DNDEBUG -fPIC -g -O2 -fdebug-prefix-map=/build/apache-arrow-9.0.0.dev191=. -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fdiagnostics-color=always -O3 -DNDEBUG -O3 -DNDEBUG -fPIC -fPIC -pthread -std=c++11 -MD -MT CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o -MF CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o.d -o CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o -c /build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/binder/transport/binder_transport.cc /build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/binder/transport/binder_transport.cc: In function ‘void set_pollset_set(grpc_transport*, grpc_stream*, grpc_pollset_set*)’: /build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/binder/transport/binder_transport.cc:135:29: error: format not a string literal and no format arguments [-Werror=format-security] 135 | gpr_log(GPR_INFO, __func__); | ^ {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16671) [C++][Docs] Include StopToken in documentation
David Li created ARROW-16671: Summary: [C++][Docs] Include StopToken in documentation Key: ARROW-16671 URL: https://issues.apache.org/jira/browse/ARROW-16671 Project: Apache Arrow Issue Type: Improvement Components: C++, Documentation Reporter: David Li It's used in Flight APIs at the very least so it would be good to have a doc page for it. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16644) [C++] Unsuppress -Wno-return-stack-address
David Li created ARROW-16644: Summary: [C++] Unsuppress -Wno-return-stack-address Key: ARROW-16644 URL: https://issues.apache.org/jira/browse/ARROW-16644 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Follow up for ARROW-16643: this code in {{small_vector_benchmark.cc}} generates a warning on clang-14 that we should unsuppress {code:cpp} template ARROW_NOINLINE int64_t ConsumeVector(Vector v) { return reinterpret_cast(v.data()); } template ARROW_NOINLINE int64_t IngestVector(const Vector& v) { return reinterpret_cast(v.data()); } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16625) [C++] IPC: validate batch schema equals stream schema in debug mode
David Li created ARROW-16625: Summary: [C++] IPC: validate batch schema equals stream schema in debug mode Key: ARROW-16625 URL: https://issues.apache.org/jira/browse/ARROW-16625 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li This came up in a Flight/Flight SQL demo a colleague was working on; it was possible to write a batch with a differing schema than what was stated for the stream, which would lead to a decoding failure on the other side. It might be useful in DEBUG mode to DCHECK this and fail-fast. The error message could also be improved; {{ArrayLoader.GetBuffer}} could at least return the index and the actual # of buffers -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16597) [Python][FlightRPC] Active server may segfault if Python interpreter shuts down
David Li created ARROW-16597: Summary: [Python][FlightRPC] Active server may segfault if Python interpreter shuts down Key: ARROW-16597 URL: https://issues.apache.org/jira/browse/ARROW-16597 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Python Affects Versions: 8.0.0 Reporter: David Li Assignee: David Li On Linux, this reliably segfaults for me with {{{}FATAL: exception not rethrown{}}}. Adding a \{[server.shutdown}} to the end fixes it. The reason is that the Python interpreter exits after running the script, and other Python threads [call PyThread_exit_thread|https://github.com/python/cpython/blob/v3.10.4/Python/ceval_gil.h#L221]. But one of the Python threads is currently in the middle of executing the RPC handler. PyThread_exit_thread boils down to pthread_exit which works by throwing an exception that it expects will not be caught. But gRPC places a {{catch(...)}} around RPC handlers and catches this exception, and then pthreads aborts when it doesn't catch the exception. We should force servers to shutdown at exit to avoid this. {code:python} import traceback import pyarrow as pa import pyarrow.flight as flight class Server(flight.FlightServerBase): def do_put(self, context, descriptor, reader, writer): raise flight.FlightCancelledError("foo", extra_info=b"bar") print("PyArrow version:", pa.__version__) server = Server("grpc://localhost:0") client = flight.connect(f"grpc://localhost:{server.port}") schema = pa.schema([]) writer, reader = client.do_put(flight.FlightDescriptor.for_command(b""), schema) try: writer.done_writing() except flight.FlightError as e: traceback.print_exc() print(e.extra_info) except Exception: traceback.print_exc() {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16588) [C++][FlightRPC] Don't inherit from ::testing::Test in Flight common tests
David Li created ARROW-16588: Summary: [C++][FlightRPC] Don't inherit from ::testing::Test in Flight common tests Key: ARROW-16588 URL: https://issues.apache.org/jira/browse/ARROW-16588 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Assignee: David Li https://github.com/apache/arrow/pull/13101#issuecomment-1127553809 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16573) [C++] Add canonical header guards for C Data Interface
David Li created ARROW-16573: Summary: [C++] Add canonical header guards for C Data Interface Key: ARROW-16573 URL: https://issues.apache.org/jira/browse/ARROW-16573 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Tom Drabas See https://lists.apache.org/thread/fxrbpo9ywm0yjol9b5zgb04w6tns59qj -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16436) [C++] Datasets ignores CSV autogenerate_column_names during discovery
David Li created ARROW-16436: Summary: [C++] Datasets ignores CSV autogenerate_column_names during discovery Key: ARROW-16436 URL: https://issues.apache.org/jira/browse/ARROW-16436 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 7.0.0 Reporter: David Li Reproduction {code:python} import tempfile from pathlib import Path import pyarrow as pa import pyarrow.csv as csv import pyarrow.dataset as ds print("PyArrow version:", pa.__version__) ro = csv.ReadOptions(autogenerate_column_names=True) po = csv.ParseOptions() co = csv.ConvertOptions() file_format = ds.CsvFileFormat(read_options=ro, parse_options=po, convert_options=co) with tempfile.TemporaryDirectory() as td: td = Path(td).resolve() with (td / "test.csv").open("w") as sink: sink.write("1,a,true,1\n") dataset = ds.dataset(str(td), format=file_format) print(dataset.to_table()) {code} Result: {noformat} PyArrow version: 7.0.0 Traceback (most recent call last): File "/home/lidavidm/csvdemo.py", line 20, in dataset = ds.dataset(str(td), format=file_format) File "/home/lidavidm/miniconda3/envs/arrow/lib/python3.10/site-packages/pyarrow/dataset.py", line 667, in dataset return _filesystem_dataset(source, **kwargs) File "/home/lidavidm/miniconda3/envs/arrow/lib/python3.10/site-packages/pyarrow/dataset.py", line 422, in _filesystem_dataset return factory.finish(schema) File "pyarrow/_dataset.pyx", line 1680, in pyarrow._dataset.DatasetFactory.finish File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/tmp5rz0ipmm/test.csv': Could not open CSV input source '/tmp/tmp5rz0ipmm/test.csv': Invalid: CSV file contained multiple columns named 1. Is this a 'csv' file? {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16420) [Python] pq.write_to_dataset always ignores partitioning
David Li created ARROW-16420: Summary: [Python] pq.write_to_dataset always ignores partitioning Key: ARROW-16420 URL: https://issues.apache.org/jira/browse/ARROW-16420 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: David Li The code unconditionally sets {{partitioning}} to None, so the user-supplied partitioning is ignored. https://github.com/apache/arrow/blob/edf7334fc38ec9bc2e019bf400403e7c61fb585e/python/pyarrow/parquet/__init__.py#L3143-L3146 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16419) [Python] pyarrow._exec_plan.execplan doesn't wait for plan to finish
David Li created ARROW-16419: Summary: [Python] pyarrow._exec_plan.execplan doesn't wait for plan to finish Key: ARROW-16419 URL: https://issues.apache.org/jira/browse/ARROW-16419 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: David Li It calls StopProducing but doesn't actually wait for finished(). This tends to cause "Plan was destroyed before finishing" to get printed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16417) [C++][Python] Segfault in test_exec_plan.py / test_joins
David Li created ARROW-16417: Summary: [C++][Python] Segfault in test_exec_plan.py / test_joins Key: ARROW-16417 URL: https://issues.apache.org/jira/browse/ARROW-16417 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 8.0.0 Reporter: David Li Occurs during wheel verification. It also happens to master. The failure is sporadic but fairly reliable. test_joins is parameterized; it's not consistent in the parameters it occurs on, but it consistently occurs on that test. The backtrace reaches into malloc_consolidate. MALLOC_CHECK doesn't help. However: {noformat} (gdb) b main Breakpoint 1 at 0x11ea20: file /home/conda/feedstock_root/build_artifacts/python-split_1625973859697/work/Programs/python.c, line 15. (gdb) command 1 Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >call mcheck(0) >continue >end {noformat} This fairly consistently fails with "memory clobbered before allocated block" but the location varies. This may be a red herring though. I also tried LD_PRELOADING a secure build of mimalloc to see if it would catch any sort of heap corruption but instead the tests pass consistently with mimalloc. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16372) [Python] Tests failing on s390x because they use Parquet
David Li created ARROW-16372: Summary: [Python] Tests failing on s390x because they use Parquet Key: ARROW-16372 URL: https://issues.apache.org/jira/browse/ARROW-16372 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: David Li If I understand correctly, the Parquet implementation does not work on big-endian? So these tests need to be properly marked? https://app.travis-ci.com/github/apache/arrow/jobs/568309096 {noformat} === FAILURES === __ test_dataset_join ___ tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join0') @pytest.mark.dataset def test_dataset_join(tempdir): t1 = pa.table({ "colA": [1, 2, 6], "col2": ["a", "b", "f"] }) > ds.write_dataset(t1, tempdir / "t1", format="parquet") usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py:4428: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:880: in write_dataset format = _ensure_format(format) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'parquet' def _ensure_format(obj): if isinstance(obj, FileFormat): return obj elif obj == "parquet": if not _parquet_available: > raise ValueError(_parquet_msg) E ValueError: The pyarrow installation is not built with support for the Parquet file format. usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:283: ValueError _ test_dataset_join_unique_key _ tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join_unique_key0') @pytest.mark.dataset def test_dataset_join_unique_key(tempdir): t1 = pa.table({ "colA": [1, 2, 6], "col2": ["a", "b", "f"] }) > ds.write_dataset(t1, tempdir / "t1", format="parquet") usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py:4459: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:880: in write_dataset format = _ensure_format(format) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'parquet' def _ensure_format(obj): if isinstance(obj, FileFormat): return obj elif obj == "parquet": if not _parquet_available: > raise ValueError(_parquet_msg) E ValueError: The pyarrow installation is not built with support for the Parquet file format. usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:283: ValueError _ test_dataset_join_collisions _ tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join_collisions0') @pytest.mark.dataset def test_dataset_join_collisions(tempdir): t1 = pa.table({ "colA": [1, 2, 6], "colB": [10, 20, 60], "colVals": ["a", "b", "f"] }) > ds.write_dataset(t1, tempdir / "t1", format="parquet") usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py:4491: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:880: in write_dataset format = _ensure_format(format) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'parquet' def _ensure_format(obj): if isinstance(obj, FileFormat): return obj elif obj == "parquet": if not _parquet_available: > raise ValueError(_parquet_msg) E ValueError: The pyarrow installation is not built with support for the Parquet file format. usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:283: ValueError _ test_parquet_invalid_version _ tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_parquet_invalid_version0') def test_parquet_invalid_version(tempdir): table = pa.table({'a': [1, 2, 3]}) with pytest.raises(ValueError, match="Unsupported Parquet format version"): > _write_table(table, tempdir / 'test_version.parquet', version="2.2") E NameError: name '_write_table' is not defined usr/local/lib/python3.8/dist-packages/pyarrow/tests/parquet/test_basic.py:52: NameError{noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16347) [Packaging] verify-release-candidate fails oddly if a Conda environment is active
David Li created ARROW-16347: Summary: [Packaging] verify-release-candidate fails oddly if a Conda environment is active Key: ARROW-16347 URL: https://issues.apache.org/jira/browse/ARROW-16347 Project: Apache Arrow Issue Type: Improvement Components: Packaging Affects Versions: 8.0.0 Reporter: David Li {noformat} Conda environment is active despite that USE_CONDA is set to 0. CommandNotFoundError: No command 'conda deactive'. Did you mean 'conda deactivate'? {noformat} The next line is {{echo "Deactivate the environment using `conda deactive` before running the verification script."}} but this tries to _evaluate_ "conda deactive" which of course fails. The typo should be fixed, but also the backticks should be escaped. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16271) [C++] Implement full chunked array support for replace_with_mask
David Li created ARROW-16271: Summary: [C++] Implement full chunked array support for replace_with_mask Key: ARROW-16271 URL: https://issues.apache.org/jira/browse/ARROW-16271 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li ARROW-15928 enables this function to accept chunked arrays for the input array, but not for the mask or replacements array. More work is needed to implement those cases (which currently just return an error). We should also consider how to make this work at least somewhat reusable for similar kernels (e.g. replace_with_indices) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16238) [C++] Fix nullptr deference in ipc/reader.cc
David Li created ARROW-16238: Summary: [C++] Fix nullptr deference in ipc/reader.cc Key: ARROW-16238 URL: https://issues.apache.org/jira/browse/ARROW-16238 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li MinGW GCC catches this {noformat} [20/278] Building CXX object src/arrow/CMakeFiles/arrow_shared.dir/Unity/unity_24_cxx.cxx.obj In file included from C:/msys64/home/User/arrow/build/cpp/src/arrow/CMakeFiles/arrow_shared.dir/Unity/unity_24_cxx.cxx:3: C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.cc: In member function 'virtual arrow::Result >()> > arrow::ipc::RecordBatchFileReaderImpl::GetRecordBatchGenerator(bool, const arrow::io::IOContext&, arrow::io::CacheOptions, arrow::internal::Executor*)': C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.cc:1303:34: warning: 'this' pointer is null [-Wnonnull] 1303 | return cached_source->Cache({{0, footer_offset_}}); | ^~~ In file included from C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.h:28, from C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.cc:18, from C:/msys64/home/User/arrow/build/cpp/src/arrow/CMakeFiles/arrow_shared.dir/Unity/unity_24_cxx.cxx:3: C:/msys64/home/User/arrow/cpp/src/arrow/io/caching.h:124:10: note: in a call to non-static member function 'arrow::Status arrow::io::internal::ReadRangeCache::Cache(std::vector)' 124 | Status Cache(std::vector ranges); | ^ {noformat} This is pretty clearly wrong: {code:cpp} std::shared_ptr cached_source; if (coalesce && file_->supports_zero_copy()) { if (!owned_file_) return Status::Invalid("Cannot coalesce without an owned file"); // Since the user is asking for all fields then we can cache the entire // file (up to the footer) return cached_source->Cache({{0, footer_offset_}}); } return WholeIpcFileRecordBatchGenerator(std::move(state), std::move(cached_source), io_context, executor); {code} It seems ARROW-14577 removed one too many lines -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16235) [C++][FlightRPC] Flight does not build on MinGW
David Li created ARROW-16235: Summary: [C++][FlightRPC] Flight does not build on MinGW Key: ARROW-16235 URL: https://issues.apache.org/jira/browse/ARROW-16235 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li Assignee: David Li https://github.com/apache/arrow/runs/6077889425?check_suite_focus=true {noformat} [180/316] Building CXX object src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj FAILED: src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj D:\a\_temp\msys64\mingw32\bin\ccache.exe D:\a\_temp\msys64\mingw32\bin\c++.exe -DARROW_FLIGHT_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=8 -DAWS_SDK_VERSION_PATCH=149 -DAWS_USE_IO_COMPLETION_PORTS -DBOOST_USE_WINDOWS_H=1 -DGRPC_NAMESPACE_FOR_TLS_CREDENTIALS_OPTIONS=grpc::experimental -DGRPC_USE_CERTIFICATE_VERIFIER -DGRPC_USE_TLS_CHANNEL_CREDENTIALS_OPTIONS -DGTEST_LINKED_AS_SHARED_LIBRARY=1 -DURI_STATIC_BUILD -DUSE_IMPORT_EXPORT -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -Darrow_flight_testing_shared_EXPORTS -ID:/a/arrow/arrow/build/cpp/src -ID:/a/arrow/arrow/cpp/src -ID:/a/arrow/arrow/cpp/src/generated -isystem D:/a/arrow/arrow/cpp/thirdparty/flatbuffers/include -isystem /mingw32/include -isystem D:/a/arrow/arrow/build/cpp/xsimd_ep/src/xsimd_ep-install/include -isystem D:/a/arrow/arrow/cpp/thirdparty/hadoop/include -Wno-noexcept-type -Wno-subobject-linkage -fdiagnostics-color=always -O3 -DNDEBUG -Wa,-mbig-obj -Wall -Wno-conversion -Wno-deprecated-declarations -Wno-sign-conversion -Wunused-result -fno-semantic-interposition -mxsave -msse4.2 -O3 -DNDEBUG -std=c++11 -MD -MT src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj -MF src\arrow\flight\CMakeFiles\arrow_flight_testing_shared.dir\Unity\unity_0_cxx.cxx.obj.d -o src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj -c D:/a/arrow/arrow/build/cpp/src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx In file included from D:/a/arrow/arrow/build/cpp/src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx:5: D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc: In function 'arrow::Status arrow::flight::ExampleTlsCertificates(std::vector*)': D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:775:31: error: variable 'std::ifstream cert_file' has initializer but incomplete type 775 | std::ifstream cert_file(cert_path.str()); | ^ D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:782:30: error: variable 'std::ifstream key_file' has initializer but incomplete type 782 | std::ifstream key_file(key_path.str()); | ^~~~ D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:42: error: expected unqualified-id before '&' token 790 | } catch (const std::ifstream::failure& e) { | ^ D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:42: error: expected ')' before '&' token 790 | } catch (const std::ifstream::failure& e) { | ~ ^ | ) D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:42: error: expected '{' before '&' token D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:44: error: 'e' was not declared in this scope 790 | } catch (const std::ifstream::failure& e) { | ^ D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc: In function 'arrow::Status arrow::flight::ExampleTlsCertificateRoot(arrow::flight::CertKeyPair*)': D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:805:29: error: variable 'std::ifstream cert_file' has initializer but incomplete type 805 | std::ifstream cert_file(path.str()); | ^~~~ D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:814:40: error: expected unqualified-id before '&' token 814 | } catch (const std::ifstream::failure& e) { | ^ D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:814:40: error: expected ')' before '&' token 814 | } catch (const std::ifstream::failure& e) { | ~ ^ | ) D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:814:40: error: expected '{' before
[jira] [Created] (ARROW-16232) [C++] Include OpenTelemetry in LICENSE.txt
David Li created ARROW-16232: Summary: [C++] Include OpenTelemetry in LICENSE.txt Key: ARROW-16232 URL: https://issues.apache.org/jira/browse/ARROW-16232 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Assignee: David Li Fix For: 8.0.0 While I don't think we're distributing it yet, we shouldn't forget to do this. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16221) [C++][Docs] Provide more complete linking/CMake project example
David Li created ARROW-16221: Summary: [C++][Docs] Provide more complete linking/CMake project example Key: ARROW-16221 URL: https://issues.apache.org/jira/browse/ARROW-16221 Project: Apache Arrow Issue Type: Improvement Components: C++, Documentation Reporter: David Li While there's a minimal example of using CMake to link against Arrow, a fuller example (or two) showing some of the Arrow libraries, the bundled dependencies (in the static build), etc. would also be useful. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16217) [C++][FlightRPC] Don't use ExecutionError in Flight SQL
David Li created ARROW-16217: Summary: [C++][FlightRPC] Don't use ExecutionError in Flight SQL Key: ARROW-16217 URL: https://issues.apache.org/jira/browse/ARROW-16217 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li This is meant for Gandiva, we should use a more relevant error -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16216) [Python][FlightRPC] Fix test_flight.py when flight is not available
David Li created ARROW-16216: Summary: [Python][FlightRPC] Fix test_flight.py when flight is not available Key: ARROW-16216 URL: https://issues.apache.org/jira/browse/ARROW-16216 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Python Reporter: Kouhei Sutou Assignee: David Li https://github.com/apache/arrow/pull/12749#discussion_r851671770 {{flight}} is {{None}} when not building flight so don't use the module at module level -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16215) [C++][FlightRPC] Segfault in TestBasicAuthHandler.FailUnauthenticatedCalls
David Li created ARROW-16215: Summary: [C++][FlightRPC] Segfault in TestBasicAuthHandler.FailUnauthenticatedCalls Key: ARROW-16215 URL: https://issues.apache.org/jira/browse/ARROW-16215 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: David Li {noformat} [ RUN ] TestBasicAuthHandler.FailUnauthenticatedCalls C:/projects/arrow/cpp/src/arrow/flight/client.cc:363: Close() failed: IOError: Flight returned unauthenticated error, with message: Invalid token. Detail: Unauthenticated. gRPC client debug context: {"created":"@1650191019.67300","description":"Error received from peer ipv4:127.0.0.1:1955","file":"D:\bld\grpc-cpp_1646464801475\work\src\core\lib\surface\call.cc","file_line":904,"grpc_message":"Invalid token. Detail: Unauthenticated","grpc_status":16}. Client context: OK. Detail: Unauthenticated unknown file: error: SEH exception with code 0xc005 thrown in the test body. {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16205) [C++][FlightRPC] Flight does not build in MacOS release verification
David Li created ARROW-16205: Summary: [C++][FlightRPC] Flight does not build in MacOS release verification Key: ARROW-16205 URL: https://issues.apache.org/jira/browse/ARROW-16205 Project: Apache Arrow Issue Type: Bug Reporter: Kouhei Sutou Assignee: David Li Fix For: 8.0.0 https://github.com/apache/arrow/pull/12749#issuecomment-1100388959 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16173) [C++] Add benchmarks for temporal functions/kernels
David Li created ARROW-16173: Summary: [C++] Add benchmarks for temporal functions/kernels Key: ARROW-16173 URL: https://issues.apache.org/jira/browse/ARROW-16173 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li See ML: https://lists.apache.org/thread/bp2f036sgfj72o46yqmglnx20zfc6tfq -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16162) [C++][FlightRPC] Flight does not build on Ubuntu 18.04
David Li created ARROW-16162: Summary: [C++][FlightRPC] Flight does not build on Ubuntu 18.04 Key: ARROW-16162 URL: https://issues.apache.org/jira/browse/ARROW-16162 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Kouhei Sutou Assignee: David Li See this nightly for instance: https://github.com/ursacomputing/crossbow/runs/5953173410?check_suite_focus=true#step:5:8623 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16149) [Python][FlightRPC] Expose UCX transport to Python
David Li created ARROW-16149: Summary: [Python][FlightRPC] Expose UCX transport to Python Key: ARROW-16149 URL: https://issues.apache.org/jira/browse/ARROW-16149 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Python Reporter: David Li The UCX transport lives in a separate shared library, which may complicate distribution (though for 8.0.0 we probably don't care about that yet). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16146) [C++] arrow-gcsfs-test is timing out
David Li created ARROW-16146: Summary: [C++] arrow-gcsfs-test is timing out Key: ARROW-16146 URL: https://issues.apache.org/jira/browse/ARROW-16146 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li {noformat} The following tests FAILED: 101 - arrow-gcsfs-test (Timeout) {noformat} Appears to have started with [an unrelated minor PR|https://github.com/apache/arrow/commit/e047c9a6c9df565b86143036cc6bab26d3a59306]. Observed on master and across several PRs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16145) [C++] Vector kernels should implement or reject null_handling = INTERSECTION
David Li created ARROW-16145: Summary: [C++] Vector kernels should implement or reject null_handling = INTERSECTION Key: ARROW-16145 URL: https://issues.apache.org/jira/browse/ARROW-16145 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: David Li As discovered in ARROW-13530, right now the framework will let you register a vector kernel with null_handling = INTERSECTION, but doesn't actually implement that (it'll preallocate but won't compute the result). We should either implement it, or decide it makes no sense and explicitly reject registering kernels with this null handling mode. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16135) [C++][FlightRPC] Investigate TSAN with gRPC/UCX tests
David Li created ARROW-16135: Summary: [C++][FlightRPC] Investigate TSAN with gRPC/UCX tests Key: ARROW-16135 URL: https://issues.apache.org/jira/browse/ARROW-16135 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li The gRPC Flight tests trigger lots of TSAN errors and the UCX Flight tests segfault inside UCX when TSAN is enabled. [This gRPC issue|https://github.com/grpc/grpc/issues/16749] is quite old, but suggests we need to build gRPC itself with TSAN. We should investigate these cases. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16127) [C++][FlightRPC] Improve concurrent call implementation in UCX client
David Li created ARROW-16127: Summary: [C++][FlightRPC] Improve concurrent call implementation in UCX client Key: ARROW-16127 URL: https://issues.apache.org/jira/browse/ARROW-16127 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li This currently relies on a pool of workers and endpoints; ideally we would be able to share a worker or even better multiplex multiple calls over a single endpoint (this would require wire protocol changes, however!). Care should be taken not to hurt performance if we do enable a multithreaded worker (which would be necessary, unless we switch to a model where all threads send work to a single worker thread). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16126) [C++][FlightRPC] Pipeline memory allocation/registration
David Li created ARROW-16126: Summary: [C++][FlightRPC] Pipeline memory allocation/registration Key: ARROW-16126 URL: https://issues.apache.org/jira/browse/ARROW-16126 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Where possible in the UCX transport, we should allocate and register buffers in the background instead of blocking the thread doing UCX work. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16125) [C++][FlightRPC] Implement shutdown with deadline for UCX
David Li created ARROW-16125: Summary: [C++][FlightRPC] Implement shutdown with deadline for UCX Key: ARROW-16125 URL: https://issues.apache.org/jira/browse/ARROW-16125 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li The UCX server in ARROW-15706 does not implement shutdown with deadline. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16124) [C++][FlightRPC] UCX server should be able to shed load
David Li created ARROW-16124: Summary: [C++][FlightRPC] UCX server should be able to shed load Key: ARROW-16124 URL: https://issues.apache.org/jira/browse/ARROW-16124 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li The UCX server from ARROW-15706 will accept connections and put them into a queue to be handled. If they aren't handled quickly enough this can lead to a lot of clients stuck waiting for the server. The server should reject connections if too many pile up so the client can error or retry or connect to a different server. (This is a pitfall of gRPC/Java that we should avoid here.) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16116) [C++] Properly handle non-nullable fields in Parquet reading
David Li created ARROW-16116: Summary: [C++] Properly handle non-nullable fields in Parquet reading Key: ARROW-16116 URL: https://issues.apache.org/jira/browse/ARROW-16116 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li ARROW-15961 found that the Parquet Arrow reader wasn't respecting the nullable aspect of fields, we need to ensure that if we reconstruct an array for a non-nullable field, that it has no validity bitmap. We need to also add tests for this case, they're implicitly tested in a few places, but we should explicitly test this for all supported types. -- This message was sent by Atlassian Jira (v8.20.1#820001)