[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi dataset

2021-09-06 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-13803:
---
Fix Version/s: 5.0.1

> [C++] Segfault on filtering taxi dataset
> 
>
> Key: ARROW-13803
> URL: https://issues.apache.org/jira/browse/ARROW-13803
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
> Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020)
>Reporter: Neal Richardson
>Assignee: David Li
>Priority: Major
>  Labels: kernel, pull-request-available, query-engine
> Fix For: 5.0.1, 6.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Found this while testing ARROW-13740. Using the nyc-taxi dataset:
> {code}
> ds %>%
>   filter(total_amount > 0, passenger_count > 0) %>%
>   summarise(n = n()) %>%
>   collect()
> {code}
> {code}
>  *** caught segfault ***
> address 0x161784000, cause 'invalid permissions'
> Traceback:
>  1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
> ...
> {code}
> lldb shows 
> {code}
> * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
> frame #0: 0x00013a79d9cc 
> libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long 
> long) + 296
> libarrow.600.dylib`arrow::BitUtil::SetBitmap:
> ->  0x13a79d9cc <+296>: ldrb   w10, [x8]
> 0x13a79d9d0 <+300>: cmpw9, #0x8  ; =0x8 
> 0x13a79d9d4 <+304>: cset   w11, lo
> 0x13a79d9d8 <+308>: andw9, w9, #0x7
> Target 0: (R) stopped.
> (lldb) 
> {code}
> Interestingly, I can evaluate those filter expressions just fine, and it only 
> seems to crash if both are provided. And I can count over the data with both:
> {code}
> ds %>% 
>   group_by(total_amount > 0, passenger_count > 0) %>% 
>   summarize(n=n()) %>% 
>   collect()
> # A tibble: 4 × 3
>   `total_amount > 0` `passenger_count > 0`  n
>   
> 1 FALSE  FALSE805
> 2 FALSE  TRUE  368680
> 3 TRUE   FALSE5810556
> 4 TRUE   TRUE  1541561340
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi dataset

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13803:
---
Labels: pull-request-available query-engine  (was: query-engine)

> [C++] Segfault on filtering taxi dataset
> 
>
> Key: ARROW-13803
> URL: https://issues.apache.org/jira/browse/ARROW-13803
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
> Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020)
>Reporter: Neal Richardson
>Priority: Major
>  Labels: pull-request-available, query-engine
> Fix For: 6.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Found this while testing ARROW-13740. Using the nyc-taxi dataset:
> {code}
> ds %>%
>   filter(total_amount > 0, passenger_count > 0) %>%
>   summarise(n = n()) %>%
>   collect()
> {code}
> {code}
>  *** caught segfault ***
> address 0x161784000, cause 'invalid permissions'
> Traceback:
>  1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
> ...
> {code}
> lldb shows 
> {code}
> * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
> frame #0: 0x00013a79d9cc 
> libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long 
> long) + 296
> libarrow.600.dylib`arrow::BitUtil::SetBitmap:
> ->  0x13a79d9cc <+296>: ldrb   w10, [x8]
> 0x13a79d9d0 <+300>: cmpw9, #0x8  ; =0x8 
> 0x13a79d9d4 <+304>: cset   w11, lo
> 0x13a79d9d8 <+308>: andw9, w9, #0x7
> Target 0: (R) stopped.
> (lldb) 
> {code}
> Interestingly, I can evaluate those filter expressions just fine, and it only 
> seems to crash if both are provided. And I can count over the data with both:
> {code}
> ds %>% 
>   group_by(total_amount > 0, passenger_count > 0) %>% 
>   summarize(n=n()) %>% 
>   collect()
> # A tibble: 4 × 3
>   `total_amount > 0` `passenger_count > 0`  n
>   
> 1 FALSE  FALSE805
> 2 FALSE  TRUE  368680
> 3 TRUE   FALSE5810556
> 4 TRUE   TRUE  1541561340
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi dataset

2021-08-30 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-13803:

Description: 
Found this while testing ARROW-13740. Using the nyc-taxi dataset:

{code}
ds %>%
  filter(total_amount > 0, passenger_count > 0) %>%
  summarise(n = n()) %>%
  collect()
{code}

{code}
 *** caught segfault ***
address 0x161784000, cause 'invalid permissions'

Traceback:
 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
...
{code}

lldb shows 

{code}
* thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
frame #0: 0x00013a79d9cc 
libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long 
long) + 296
libarrow.600.dylib`arrow::BitUtil::SetBitmap:
->  0x13a79d9cc <+296>: ldrb   w10, [x8]
0x13a79d9d0 <+300>: cmpw9, #0x8  ; =0x8 
0x13a79d9d4 <+304>: cset   w11, lo
0x13a79d9d8 <+308>: andw9, w9, #0x7
Target 0: (R) stopped.
(lldb) 
{code}

Interestingly, I can evaluate those filter expressions just fine, and it only 
seems to crash if both are provided. And I can count over the data with both:

{code}
ds %>% 
  group_by(total_amount > 0, passenger_count > 0) %>% 
  summarize(n=n()) %>% 
  collect()

# A tibble: 4 × 3
  `total_amount > 0` `passenger_count > 0`  n
  
1 FALSE  FALSE805
2 FALSE  TRUE  368680
3 TRUE   FALSE5810556
4 TRUE   TRUE  1541561340
{code}

  was:
Found this while testing ARROW-13740. Using the nyc-taxi dataset:

{code}
ds %>%
  filter(total_amount > 0, passenger_count > 0) %>%
  summarise(n = n()) %>%
  collect()
{code}

{code}
 *** caught segfault ***
address 0x161784000, cause 'invalid permissions'

Traceback:
 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
...
{code}

lldb shows 

{code}
* thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
frame #0: 0x00013a79d9cc 
libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long 
long) + 296
libarrow.600.dylib`arrow::BitUtil::SetBitmap:
->  0x13a79d9cc <+296>: ldrb   w10, [x8]
0x13a79d9d0 <+300>: cmpw9, #0x8  ; =0x8 
0x13a79d9d4 <+304>: cset   w11, lo
0x13a79d9d8 <+308>: andw9, w9, #0x7
Target 0: (R) stopped.
(lldb) 
{code}

Interestingly, I can evaluate those filter expressions just fine, and it only 
seems to crash if both are provided. And I can count over the data with both:

{code}
ds %>% 
  group_by(total_amount > 0, passenger_count > 0)
  %>% summarize(n=n())
  %>% collect()

# A tibble: 4 × 3
  `total_amount > 0` `passenger_count > 0`  n
  
1 FALSE  FALSE805
2 FALSE  TRUE  368680
3 TRUE   FALSE5810556
4 TRUE   TRUE  1541561340
{code}


> [C++] Segfault on filtering taxi dataset
> 
>
> Key: ARROW-13803
> URL: https://issues.apache.org/jira/browse/ARROW-13803
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
> Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020)
>Reporter: Neal Richardson
>Priority: Major
>  Labels: query-engine
> Fix For: 6.0.0
>
>
> Found this while testing ARROW-13740. Using the nyc-taxi dataset:
> {code}
> ds %>%
>   filter(total_amount > 0, passenger_count > 0) %>%
>   summarise(n = n()) %>%
>   collect()
> {code}
> {code}
>  *** caught segfault ***
> address 0x161784000, cause 'invalid permissions'
> Traceback:
>  1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
> ...
> {code}
> lldb shows 
> {code}
> * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
> frame #0: 0x00013a79d9cc 
> libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long 
> long) + 296
> libarrow.600.dylib`arrow::BitUtil::SetBitmap:
> ->  0x13a79d9cc <+296>: ldrb   w10, [x8]
> 0x13a79d9d0 <+300>: cmpw9, #0x8  ; =0x8 
> 0x13a79d9d4 <+304>: cset   w11, lo
> 0x13a79d9d8 <+308>: andw9, w9, #0x7
> Target 0: (R) stopped.
> (lldb) 
> {code}
> Interestingly, I can evaluate those filter expressions just fine, and it only 
> seems to crash if both are provided. And I can count over the data with both:
> {code}
> ds %>% 
>   group_by(total_amount > 0, passenger_count > 0) %>% 
>   summarize(n=n()) %>% 
>   collect()
> # A tibble: 4 × 3
>   `total_amount > 0` `passenger_count > 0`  n
>   
> 1 FALSE  FALSE805
> 2 FALSE  TRUE  36868