[jira] [Updated] (ARROW-4748) [Rust] [DataFusion] GROUP BY performance could be optimized
[ https://issues.apache.org/jira/browse/ARROW-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4748: -- Labels: pull-request-available (was: ) > [Rust] [DataFusion] GROUP BY performance could be optimized > --- > > Key: ARROW-4748 > URL: https://issues.apache.org/jira/browse/ARROW-4748 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Affects Versions: 0.12.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > > The logic to build the group by keys is row-based, performing an array > downcast on every single group by value. This could be done in a columnar way > instead. > > I also wonder if it is possible to avoid converting the result map to an > array of map entries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6864) [C++] bz2 / zstd tests not enabled
[ https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6864: -- Labels: pull-request-available (was: ) > [C++] bz2 / zstd tests not enabled > -- > > Key: ARROW-6864 > URL: https://issues.apache.org/jira/browse/ARROW-6864 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the > relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} > are still not enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-4748) [Rust] [DataFusion] GROUP BY performance could be optimized
[ https://issues.apache.org/jira/browse/ARROW-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-4748: - Assignee: Andy Grove > [Rust] [DataFusion] GROUP BY performance could be optimized > --- > > Key: ARROW-4748 > URL: https://issues.apache.org/jira/browse/ARROW-4748 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Affects Versions: 0.12.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > The logic to build the group by keys is row-based, performing an array > downcast on every single group by value. This could be done in a columnar way > instead. > > I also wonder if it is possible to avoid converting the result map to an > array of map entries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6864) [C++] bz2 / zstd tests not enabled
[ https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6864: Fix Version/s: 1.0.0 > [C++] bz2 / zstd tests not enabled > -- > > Key: ARROW-6864 > URL: https://issues.apache.org/jira/browse/ARROW-6864 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Antoine Pitrou >Priority: Major > Fix For: 1.0.0 > > > When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the > relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} > are still not enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6864) [C++] bz2 / zstd tests not enabled
[ https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950121#comment-16950121 ] Wes McKinney commented on ARROW-6864: - Probably caused by my change to the flags. I'll take a look > [C++] bz2 / zstd tests not enabled > -- > > Key: ARROW-6864 > URL: https://issues.apache.org/jira/browse/ARROW-6864 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Antoine Pitrou >Priority: Major > Fix For: 1.0.0 > > > When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the > relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} > are still not enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6806) [C++] Segfault deserializing ListArray containing null/empty list
[ https://issues.apache.org/jira/browse/ARROW-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6806: Summary: [C++] Segfault deserializing ListArray containing null/empty list (was: Segfault deserializing ListArray containing null/empty list) > [C++] Segfault deserializing ListArray containing null/empty list > - > > Key: ARROW-6806 > URL: https://issues.apache.org/jira/browse/ARROW-6806 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Max Bolingbroke >Assignee: Antoine Pitrou >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0, 0.15.1 > > Time Spent: 40m > Remaining Estimate: 0h > > The following code segfaults for me (Windows and Linux, pyarrow 0.15): > > {code:java} > import pyarrow as pa > from io import BytesIO > x = > b'\xdc\x00\x00\x00\x10\x00\x00\x00\x0c\x00\x0e\x00\x06\x00\r\x00\x08\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x03\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x18\x00\x00\x00\x00\x00\x12\x00\x18\x00\x14\x00\x13\x00\x12\x00\x0c\x00\x00\x00\x08\x00\x04\x00\x12\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00`\x00\x00\x00\x00\x00\x0c\x01\\\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x18\x00\x00\x00\x00\x00\x12\x00\x18\x00\x14\x00\x00\x00\x13\x00\x0c\x00\x00\x00\x08\x00\x04\x00\x12\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x00\x05\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\xff\xff\xff\x06\x00\x00\x00$data$\x00\x00\x04\x00\x04\x00\x04\x00\x00\x00\x10\x00\x00\x00exchangeCodeList\x00\x00\x00\x00\xcc\x00\x00\x00\x14\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x16\x00\x0e\x00\x15\x00\x10\x00\x04\x00\x0c\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x10\x00\x00\x00\x00\x03\n\x00\x18\x00\x0c\x00\x08\x00\x04\x00\n\x00\x00\x00\x14\x00\x00\x00h\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > r = pa.RecordBatchStreamReader(BytesIO(x)) > r.read_all() > {code} > I *think* what should happen instead is that I should get a Table with a > single column named "exchangeCodeList", where the column is a ChunkedArray > with a single chunk, where that chunk is a ListArray containing just a single > element (a null). Failing that (i.e. if the bytestring is actually > malformed), pyarrow should maybe throw an error instead of segfaulting? > I'm not 100% sure how the bytestring was generated: I think it comes from a > Java-based server. I can deserialize the server response fine if all the > records have at least one element in the "exchangeCodeList" column, but not > if at least one of them is null. I've tried to reproduce the failure by > generating the bytestring with pyarrow but can't trigger the segfault. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6860) [Python] Only link libarrow_flight.so to pyarrow._flight
[ https://issues.apache.org/jira/browse/ARROW-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6860. - Resolution: Fixed Issue resolved by pull request 5627 [https://github.com/apache/arrow/pull/5627] > [Python] Only link libarrow_flight.so to pyarrow._flight > > > Key: ARROW-6860 > URL: https://issues.apache.org/jira/browse/ARROW-6860 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.15.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > See BEAM-8368. We need to find a strategy to mitigate protobuf static linking > issues with teh Beam community -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown
[ https://issues.apache.org/jira/browse/ARROW-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6867: -- Labels: pull-request-available (was: ) > [FlightRPC][Java] Flight server can hang JVM on shutdown > > > Key: ARROW-6867 > URL: https://issues.apache.org/jira/browse/ARROW-6867 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.15.0 >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > I noticed this while working on Flight integration tests. FlightService keeps > an executor, which can hang the JVM on shutdown if the executor itself is not > shut down. > It's used by Handshake and DoPut. > I think this surfaced because I wrote an AuthHandler that threw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-5680) [Rust] datafusion group-by tests depends on result set order
[ https://issues.apache.org/jira/browse/ARROW-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-5680. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5622 [https://github.com/apache/arrow/pull/5622] > [Rust] datafusion group-by tests depends on result set order > > > Key: ARROW-5680 > URL: https://issues.apache.org/jira/browse/ARROW-5680 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: Francois Saint-Jacques >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See > https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link > once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further > failures, e.g. > {code:bash} > running 18 tests > test csv_query_group_by_int_min_max ... FAILED > test csv_query_external_table_count ... ok > test csv_query_count ... ok > test csv_count_star ... ok > test csv_query_avg ... ok > test csv_query_avg_multi_batch ... ok > test csv_query_cast ... ok > test csv_query_group_by_avg ... FAILED > test csv_query_group_by_string_min_max ... FAILED > test csv_query_group_by_int_count ... FAILED > test csv_query_limit ... ok > test csv_query_limit_bigger_than_nbr_of_rows ... ok > test csv_query_limit_with_same_nbr_of_rows ... ok > test csv_query_cast_literal ... ok > test csv_query_limit_zero ... ok > test csv_query_create_external_table ... ok > test csv_query_with_predicate ... ok > test parquet_query ... ok > failures: > csv_query_group_by_int_min_max stdout > thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left > == right)` > left: > `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`, > right: > `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`', > datafusion/tests/sql.rs:77:5 > note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace. > csv_query_group_by_avg stdout > thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == > right)` > left: > `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`, > right: > `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`', > datafusion/tests/sql.rs:99:5 > csv_query_group_by_string_min_max stdout > thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: > `(left == right)` > left: > `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`, > right: > `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`', > datafusion/tests/sql.rs:187:5 > csv_query_group_by_int_count stdout > thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left > == right)` > left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`, > right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', > datafusion/tests/sql.rs:175:5 > {code} > I suspect that the tests are expecting the group-by results in a fix order. > That would be highly dependent on the iterator of the hash table. Note that > once I did a rustup update (and docker rmi rustlangrust/nightly), the > failures have gone away. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6690) [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD
[ https://issues.apache.org/jira/browse/ARROW-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-6690. --- Resolution: Fixed Issue resolved by pull request 5606 [https://github.com/apache/arrow/pull/5606] > [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD > -- > > Key: ARROW-6690 > URL: https://issues.apache.org/jira/browse/ARROW-6690 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently the implementation of HashAggregate in the new physical plan uses > the same logic regardless of whether a grouping expression is used. > For the case where there is no grouping expression, such as "SELECT SUM(a) > FROM b" we can use the compute kernels to perform an aggregate operation on > each batch rather than iterating over each row and accumulating individual > values. > This optimization already exists in the original implementation of aggregate > queries direct from the logical plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-5680) [Rust] datafusion group-by tests depends on result set order
[ https://issues.apache.org/jira/browse/ARROW-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-5680: - Assignee: Andy Grove > [Rust] datafusion group-by tests depends on result set order > > > Key: ARROW-5680 > URL: https://issues.apache.org/jira/browse/ARROW-5680 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: Francois Saint-Jacques >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > See > https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link > once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further > failures, e.g. > {code:bash} > running 18 tests > test csv_query_group_by_int_min_max ... FAILED > test csv_query_external_table_count ... ok > test csv_query_count ... ok > test csv_count_star ... ok > test csv_query_avg ... ok > test csv_query_avg_multi_batch ... ok > test csv_query_cast ... ok > test csv_query_group_by_avg ... FAILED > test csv_query_group_by_string_min_max ... FAILED > test csv_query_group_by_int_count ... FAILED > test csv_query_limit ... ok > test csv_query_limit_bigger_than_nbr_of_rows ... ok > test csv_query_limit_with_same_nbr_of_rows ... ok > test csv_query_cast_literal ... ok > test csv_query_limit_zero ... ok > test csv_query_create_external_table ... ok > test csv_query_with_predicate ... ok > test parquet_query ... ok > failures: > csv_query_group_by_int_min_max stdout > thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left > == right)` > left: > `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`, > right: > `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`', > datafusion/tests/sql.rs:77:5 > note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace. > csv_query_group_by_avg stdout > thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == > right)` > left: > `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`, > right: > `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`', > datafusion/tests/sql.rs:99:5 > csv_query_group_by_string_min_max stdout > thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: > `(left == right)` > left: > `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`, > right: > `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`', > datafusion/tests/sql.rs:187:5 > csv_query_group_by_int_count stdout > thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left > == right)` > left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`, > right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', > datafusion/tests/sql.rs:175:5 > {code} > I suspect that the tests are expecting the group-by results in a fix order. > That would be highly dependent on the iterator of the hash table. Note that > once I did a rustup update (and docker rmi rustlangrust/nightly), the > failures have gone away. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6859) [CI][Nightly] Disable docker layer caching for CircleCI tasks
[ https://issues.apache.org/jira/browse/ARROW-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-6859. Resolution: Fixed Issue resolved by pull request 5617 [https://github.com/apache/arrow/pull/5617] > [CI][Nightly] Disable docker layer caching for CircleCI tasks > - > > Key: ARROW-6859 > URL: https://issues.apache.org/jira/browse/ARROW-6859 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > CircleCI builds are failing because the layer caching is not available for > free plans. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown
[ https://issues.apache.org/jira/browse/ARROW-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950062#comment-16950062 ] David Li edited comment on ARROW-6867 at 10/12/19 3:00 PM: --- Aha, the real reason is 1) By default, we share an executor between gRPC and Flight. 2) gRPC doesn't take ownership of the executor, so we need to manually shut it down on exit. The safest thing would be to clean up the executor, and document Flight as owning it. was (Author: lidavidm): Aha, the real reason is 1) By default, we share an executor between gRPC and Flight. 2) gRPC doesn't take ownership of the executor, so we need to manually shut it down on exit. The safest thing would be to use separate executors, and make sure to clean up both executors. (This would also avoid potential deadlocks; gRPC can't process client cancellations if the executor is full.) > [FlightRPC][Java] Flight server can hang JVM on shutdown > > > Key: ARROW-6867 > URL: https://issues.apache.org/jira/browse/ARROW-6867 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.15.0 >Reporter: David Li >Assignee: David Li >Priority: Major > Fix For: 1.0.0 > > > I noticed this while working on Flight integration tests. FlightService keeps > an executor, which can hang the JVM on shutdown if the executor itself is not > shut down. > It's used by Handshake and DoPut. > I think this surfaced because I wrote an AuthHandler that threw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown
[ https://issues.apache.org/jira/browse/ARROW-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950062#comment-16950062 ] David Li commented on ARROW-6867: - Aha, the real reason is 1) By default, we share an executor between gRPC and Flight. 2) gRPC doesn't take ownership of the executor, so we need to manually shut it down on exit. The safest thing would be to use separate executors, and make sure to clean up both executors. (This would also avoid potential deadlocks; gRPC can't process client cancellations if the executor is full.) > [FlightRPC][Java] Flight server can hang JVM on shutdown > > > Key: ARROW-6867 > URL: https://issues.apache.org/jira/browse/ARROW-6867 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 0.15.0 >Reporter: David Li >Assignee: David Li >Priority: Major > Fix For: 1.0.0 > > > I noticed this while working on Flight integration tests. FlightService keeps > an executor, which can hang the JVM on shutdown if the executor itself is not > shut down. > It's used by Handshake and DoPut. > I think this surfaced because I wrote an AuthHandler that threw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown
David Li created ARROW-6867: --- Summary: [FlightRPC][Java] Flight server can hang JVM on shutdown Key: ARROW-6867 URL: https://issues.apache.org/jira/browse/ARROW-6867 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 0.15.0 Reporter: David Li Assignee: David Li Fix For: 1.0.0 I noticed this while working on Flight integration tests. FlightService keeps an executor, which can hang the JVM on shutdown if the executor itself is not shut down. It's used by Handshake and DoPut. I think this surfaced because I wrote an AuthHandler that threw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6866) [Java] Improve the performance of calculating hash code for struct vector
[ https://issues.apache.org/jira/browse/ARROW-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6866: -- Labels: pull-request-available (was: ) > [Java] Improve the performance of calculating hash code for struct vector > - > > Key: ARROW-6866 > URL: https://issues.apache.org/jira/browse/ARROW-6866 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Minor > Labels: pull-request-available > > Improve the performance of hashCode(int) method for StructVector: > 1. We can get the child vectors directly, so there is no need to get the name > from the child vector and then use the name to get the vector. > 2. The child vectors cannot be null, so there is no need to check it. > The performance improvement depends on the complexity of the hash algorithm. > For computational intensive hash algorithms, the improvement can be small; > while for simple hash algorithms, the improvement can be notable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6866) [Java] Improve the performance of calculating hash code for struct vector
Liya Fan created ARROW-6866: --- Summary: [Java] Improve the performance of calculating hash code for struct vector Key: ARROW-6866 URL: https://issues.apache.org/jira/browse/ARROW-6866 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Liya Fan Assignee: Liya Fan Improve the performance of hashCode(int) method for StructVector: 1. We can get the child vectors directly, so there is no need to get the name from the child vector and then use the name to get the vector. 2. The child vectors cannot be null, so there is no need to check it. The performance improvement depends on the complexity of the hash algorithm. For computational intensive hash algorithms, the improvement can be small; while for simple hash algorithms, the improvement can be notable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6865) [Java] Improve the performance of comparing an ArrowBuf against a byte array
[ https://issues.apache.org/jira/browse/ARROW-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6865: -- Labels: pull-request-available (was: ) > [Java] Improve the performance of comparing an ArrowBuf against a byte array > > > Key: ARROW-6865 > URL: https://issues.apache.org/jira/browse/ARROW-6865 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > > We change the way of comparing an ArrowBuf against a byte array from byte > wise comparison to comparison by long/int/byte. > Benchmark shows that there is a 6.7x performance improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6865) [Java] Improve the performance of comparing an ArrowBuf against a byte array
Liya Fan created ARROW-6865: --- Summary: [Java] Improve the performance of comparing an ArrowBuf against a byte array Key: ARROW-6865 URL: https://issues.apache.org/jira/browse/ARROW-6865 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Liya Fan Assignee: Liya Fan We change the way of comparing an ArrowBuf against a byte array from byte wise comparison to comparison by long/int/byte. Benchmark shows that there is a 6.7x performance improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6864) [C++] bz2 / zstd tests not enabled
[ https://issues.apache.org/jira/browse/ARROW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950011#comment-16950011 ] Antoine Pitrou commented on ARROW-6864: --- cc [~wesm] > [C++] bz2 / zstd tests not enabled > -- > > Key: ARROW-6864 > URL: https://issues.apache.org/jira/browse/ARROW-6864 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Antoine Pitrou >Priority: Major > > When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the > relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} > are still not enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6864) [C++] bz2 / zstd tests not enabled
Antoine Pitrou created ARROW-6864: - Summary: [C++] bz2 / zstd tests not enabled Key: ARROW-6864 URL: https://issues.apache.org/jira/browse/ARROW-6864 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.15.0 Reporter: Antoine Pitrou When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the relevant tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} are still not enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6863) [Java] Provide parallel searcher
[ https://issues.apache.org/jira/browse/ARROW-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6863: -- Labels: pull-request-available (was: ) > [Java] Provide parallel searcher > > > Key: ARROW-6863 > URL: https://issues.apache.org/jira/browse/ARROW-6863 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > > For scenarios where the vector is large and the a low response time is > required, we need to search the vector in parallel to improve the > responsiveness. > This issue tries to provide a parallel searcher for the equality semantics > (the support for ordering semantics is not ready yet, as we need a way to > distribute the comparator). > The implementation is based on multi-threading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6863) [Java] Provide parallel searcher
Liya Fan created ARROW-6863: --- Summary: [Java] Provide parallel searcher Key: ARROW-6863 URL: https://issues.apache.org/jira/browse/ARROW-6863 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Liya Fan Assignee: Liya Fan For scenarios where the vector is large and the a low response time is required, we need to search the vector in parallel to improve the responsiveness. This issue tries to provide a parallel searcher for the equality semantics (the support for ordering semantics is not ready yet, as we need a way to distribute the comparator). The implementation is based on multi-threading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6662) [Java] Implement equals/approxEquals API for VectorSchemaRoot
[ https://issues.apache.org/jira/browse/ARROW-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6662: -- Labels: pull-request-available (was: ) > [Java] Implement equals/approxEquals API for VectorSchemaRoot > - > > Key: ARROW-6662 > URL: https://issues.apache.org/jira/browse/ARROW-6662 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > > Currently with the new added visitor APIs(ARROW-6211), we could implement > equals/approxEquals for VectorSchemaRoot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6850) [Java] Jdbc converter support Null type
[ https://issues.apache.org/jira/browse/ARROW-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6850: -- Labels: pull-request-available (was: ) > [Java] Jdbc converter support Null type > --- > > Key: ARROW-6850 > URL: https://issues.apache.org/jira/browse/ARROW-6850 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > > java.sql.Types.Null is not supported yet since we have no NullVector in Java > code before. > This could be implemented after ARROW-1638 merged (IPC roundtrip for null > type). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
[ https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948 ] Ji Liu edited comment on ARROW-6464 at 10/12/19 7:40 AM: - Issue resolved by pull request 5293 [https://github.com/apache/arrow/pull/5293] was (Author: tianchen92): Issue resolved in [https://github.com/apache/arrow/pull/5293] > [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API > --- > > Key: ARROW-6464 > URL: https://issues.apache.org/jira/browse/ARROW-6464 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently {{FixedSizeListVector#splitAndTransfer}} actually use > {{copyValueSafe}} which has memory copy, we should use slice API instead. > Meanwhile, {{splitAndTransfer}} in all classes should position index check at > beginning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot
[ https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949939#comment-16949939 ] Ji Liu edited comment on ARROW-6661 at 10/12/19 7:40 AM: - Issue resolved by pull request 5470 [https://github.com/apache/arrow/pull/5470] was (Author: tianchen92): Issue resolved in [https://github.com/apache/arrow/pull/5470] > [Java] Implement APIs like slice to enhance VectorSchemaRoot > > > Key: ARROW-6661 > URL: https://issues.apache.org/jira/browse/ARROW-6661 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently in Java Implementation there is no APIs like slice for record batch > like C++/Python. > This issue is about to implement slice/getVector/addVector/removeVector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
[ https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948 ] Ji Liu edited comment on ARROW-6464 at 10/12/19 7:39 AM: - Issue resolved in [https://github.com/apache/arrow/pull/5293] was (Author: tianchen92): Issue resolve in [https://github.com/apache/arrow/pull/5293] > [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API > --- > > Key: ARROW-6464 > URL: https://issues.apache.org/jira/browse/ARROW-6464 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently {{FixedSizeListVector#splitAndTransfer}} actually use > {{copyValueSafe}} which has memory copy, we should use slice API instead. > Meanwhile, {{splitAndTransfer}} in all classes should position index check at > beginning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
[ https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu resolved ARROW-6464. --- Fix Version/s: 0.15.1 Resolution: Fixed Issue resolve in [https://github.com/apache/arrow/pull/5293] > [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API > --- > > Key: ARROW-6464 > URL: https://issues.apache.org/jira/browse/ARROW-6464 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently {{FixedSizeListVector#splitAndTransfer}} actually use > {{copyValueSafe}} which has memory copy, we should use slice API instead. > Meanwhile, {{splitAndTransfer}} in all classes should position index check at > beginning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot
[ https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu resolved ARROW-6661. --- Fix Version/s: 0.15.1 Resolution: Fixed Issue resolved in [https://github.com/apache/arrow/pull/5470] > [Java] Implement APIs like slice to enhance VectorSchemaRoot > > > Key: ARROW-6661 > URL: https://issues.apache.org/jira/browse/ARROW-6661 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently in Java Implementation there is no APIs like slice for record batch > like C++/Python. > This issue is about to implement slice/getVector/addVector/removeVector. -- This message was sent by Atlassian Jira (v8.3.4#803005)