[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi dataset
[ https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-13803: --- Fix Version/s: 5.0.1 > [C++] Segfault on filtering taxi dataset > > > Key: ARROW-13803 > URL: https://issues.apache.org/jira/browse/ARROW-13803 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020) >Reporter: Neal Richardson >Assignee: David Li >Priority: Major > Labels: kernel, pull-request-available, query-engine > Fix For: 5.0.1, 6.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Found this while testing ARROW-13740. Using the nyc-taxi dataset: > {code} > ds %>% > filter(total_amount > 0, passenger_count > 0) %>% > summarise(n = n()) %>% > collect() > {code} > {code} > *** caught segfault *** > address 0x161784000, cause 'invalid permissions' > Traceback: > 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options) > ... > {code} > lldb shows > {code} > * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000) > frame #0: 0x00013a79d9cc > libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long > long) + 296 > libarrow.600.dylib`arrow::BitUtil::SetBitmap: > -> 0x13a79d9cc <+296>: ldrb w10, [x8] > 0x13a79d9d0 <+300>: cmpw9, #0x8 ; =0x8 > 0x13a79d9d4 <+304>: cset w11, lo > 0x13a79d9d8 <+308>: andw9, w9, #0x7 > Target 0: (R) stopped. > (lldb) > {code} > Interestingly, I can evaluate those filter expressions just fine, and it only > seems to crash if both are provided. And I can count over the data with both: > {code} > ds %>% > group_by(total_amount > 0, passenger_count > 0) %>% > summarize(n=n()) %>% > collect() > # A tibble: 4 × 3 > `total_amount > 0` `passenger_count > 0` n > > 1 FALSE FALSE805 > 2 FALSE TRUE 368680 > 3 TRUE FALSE5810556 > 4 TRUE TRUE 1541561340 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi dataset
[ https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13803: --- Labels: pull-request-available query-engine (was: query-engine) > [C++] Segfault on filtering taxi dataset > > > Key: ARROW-13803 > URL: https://issues.apache.org/jira/browse/ARROW-13803 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020) >Reporter: Neal Richardson >Priority: Major > Labels: pull-request-available, query-engine > Fix For: 6.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Found this while testing ARROW-13740. Using the nyc-taxi dataset: > {code} > ds %>% > filter(total_amount > 0, passenger_count > 0) %>% > summarise(n = n()) %>% > collect() > {code} > {code} > *** caught segfault *** > address 0x161784000, cause 'invalid permissions' > Traceback: > 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options) > ... > {code} > lldb shows > {code} > * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000) > frame #0: 0x00013a79d9cc > libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long > long) + 296 > libarrow.600.dylib`arrow::BitUtil::SetBitmap: > -> 0x13a79d9cc <+296>: ldrb w10, [x8] > 0x13a79d9d0 <+300>: cmpw9, #0x8 ; =0x8 > 0x13a79d9d4 <+304>: cset w11, lo > 0x13a79d9d8 <+308>: andw9, w9, #0x7 > Target 0: (R) stopped. > (lldb) > {code} > Interestingly, I can evaluate those filter expressions just fine, and it only > seems to crash if both are provided. And I can count over the data with both: > {code} > ds %>% > group_by(total_amount > 0, passenger_count > 0) %>% > summarize(n=n()) %>% > collect() > # A tibble: 4 × 3 > `total_amount > 0` `passenger_count > 0` n > > 1 FALSE FALSE805 > 2 FALSE TRUE 368680 > 3 TRUE FALSE5810556 > 4 TRUE TRUE 1541561340 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi dataset
[ https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-13803: Description: Found this while testing ARROW-13740. Using the nyc-taxi dataset: {code} ds %>% filter(total_amount > 0, passenger_count > 0) %>% summarise(n = n()) %>% collect() {code} {code} *** caught segfault *** address 0x161784000, cause 'invalid permissions' Traceback: 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options) ... {code} lldb shows {code} * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000) frame #0: 0x00013a79d9cc libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long long) + 296 libarrow.600.dylib`arrow::BitUtil::SetBitmap: -> 0x13a79d9cc <+296>: ldrb w10, [x8] 0x13a79d9d0 <+300>: cmpw9, #0x8 ; =0x8 0x13a79d9d4 <+304>: cset w11, lo 0x13a79d9d8 <+308>: andw9, w9, #0x7 Target 0: (R) stopped. (lldb) {code} Interestingly, I can evaluate those filter expressions just fine, and it only seems to crash if both are provided. And I can count over the data with both: {code} ds %>% group_by(total_amount > 0, passenger_count > 0) %>% summarize(n=n()) %>% collect() # A tibble: 4 × 3 `total_amount > 0` `passenger_count > 0` n 1 FALSE FALSE805 2 FALSE TRUE 368680 3 TRUE FALSE5810556 4 TRUE TRUE 1541561340 {code} was: Found this while testing ARROW-13740. Using the nyc-taxi dataset: {code} ds %>% filter(total_amount > 0, passenger_count > 0) %>% summarise(n = n()) %>% collect() {code} {code} *** caught segfault *** address 0x161784000, cause 'invalid permissions' Traceback: 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options) ... {code} lldb shows {code} * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000) frame #0: 0x00013a79d9cc libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long long) + 296 libarrow.600.dylib`arrow::BitUtil::SetBitmap: -> 0x13a79d9cc <+296>: ldrb w10, [x8] 0x13a79d9d0 <+300>: cmpw9, #0x8 ; =0x8 0x13a79d9d4 <+304>: cset w11, lo 0x13a79d9d8 <+308>: andw9, w9, #0x7 Target 0: (R) stopped. (lldb) {code} Interestingly, I can evaluate those filter expressions just fine, and it only seems to crash if both are provided. And I can count over the data with both: {code} ds %>% group_by(total_amount > 0, passenger_count > 0) %>% summarize(n=n()) %>% collect() # A tibble: 4 × 3 `total_amount > 0` `passenger_count > 0` n 1 FALSE FALSE805 2 FALSE TRUE 368680 3 TRUE FALSE5810556 4 TRUE TRUE 1541561340 {code} > [C++] Segfault on filtering taxi dataset > > > Key: ARROW-13803 > URL: https://issues.apache.org/jira/browse/ARROW-13803 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020) >Reporter: Neal Richardson >Priority: Major > Labels: query-engine > Fix For: 6.0.0 > > > Found this while testing ARROW-13740. Using the nyc-taxi dataset: > {code} > ds %>% > filter(total_amount > 0, passenger_count > 0) %>% > summarise(n = n()) %>% > collect() > {code} > {code} > *** caught segfault *** > address 0x161784000, cause 'invalid permissions' > Traceback: > 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options) > ... > {code} > lldb shows > {code} > * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000) > frame #0: 0x00013a79d9cc > libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long > long) + 296 > libarrow.600.dylib`arrow::BitUtil::SetBitmap: > -> 0x13a79d9cc <+296>: ldrb w10, [x8] > 0x13a79d9d0 <+300>: cmpw9, #0x8 ; =0x8 > 0x13a79d9d4 <+304>: cset w11, lo > 0x13a79d9d8 <+308>: andw9, w9, #0x7 > Target 0: (R) stopped. > (lldb) > {code} > Interestingly, I can evaluate those filter expressions just fine, and it only > seems to crash if both are provided. And I can count over the data with both: > {code} > ds %>% > group_by(total_amount > 0, passenger_count > 0) %>% > summarize(n=n()) %>% > collect() > # A tibble: 4 × 3 > `total_amount > 0` `passenger_count > 0` n > > 1 FALSE FALSE805 > 2 FALSE TRUE 36868