amoeba commented on PR #37709:
URL: https://github.com/apache/arrow/pull/37709#issuecomment-1763573360

   Hey @oliviermeslin, I ran your example from 
https://github.com/apache/arrow/issues/37655 w/o this patch and got the 
expected error after the script printed "Doing the join with 9 variables":
   
   > ! Invalid: There are more than 2^32 bytes of key data.  Acero cannot 
process a join of this magnitude
   
   I then built libarrow and the R package off your patch and actually got a 
segfault. Full output below:
   
   <details>
   <summary>Output with segfault</summary>
   
   ```
   ❯ Rscript acero-join-test.R
   Welcome to R :)
   This session's PID is 2890
   Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.
   
   Attaching package: ‘arrow’
   
   The following object is masked from ‘package:utils’:
   
       timestamp
   
   
   Attaching package: ‘dplyr’
   
   The following objects are masked from ‘package:stats’:
   
       filter, lag
   
   The following objects are masked from ‘package:base’:
   
       intersect, setdiff, setequal, union
   
   [1] "Doing the join with 2 variables"
   [1] "id"        "variable1"
   [1] "Doing the join with 3 variables"
   [1] "id"        "variable1" "variable2"
   [1] "Doing the join with 4 variables"
   [1] "id"        "variable1" "variable2" "variable3"
   [1] "Doing the join with 5 variables"
   [1] "id"        "variable1" "variable2" "variable3" "variable4"
   [1] "Doing the join with 6 variables"
   [1] "id"        "variable1" "variable2" "variable3" "variable4" "variable5"
   [1] "Doing the join with 7 variables"
   [1] "id"        "variable1" "variable2" "variable3" "variable4" "variable5"
   [7] "variable6"
   [1] "Doing the join with 8 variables"
   [1] "id"        "variable1" "variable2" "variable3" "variable4" "variable5"
   [7] "variable6" "variable7"
   [1] "Doing the join with 9 variables"
   [1] "id"        "variable1" "variable2" "variable3" "variable4" "variable5"
   [7] "variable6" "variable7" "variable8"
   
    *** caught segfault ***
   address 0x7ed0457b20, cause 'invalid permissions'
   
    *** caught segfault ***
   address 0x7f6676b420, cause 'invalid permissions'
   
    *** caught segfault ***
   address 0x7ed8992de0, cause 'invalid permissions'
   
    *** caught segfault ***
   address 0x7ee948e360, cause 'invalid permissions'
   
   Traceback:
    1: Table__from_ExecPlanReader(self)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader)
   
   Traceback:
    1: Table__from_ExecPlanReader(self)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader)
    4: as_arrow_table(reader)
    5: as_arrow_table.arrow_dplyr_query(x)
    6: as_arrow_table(x)
   
   Traceback:
   
   Traceback:
    1: Table__from_ExecPlanReader(self) 4: as_arrow_table(reader)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader) 7: doTryCatch(return(expr), 
name, parentenv, handler)
    8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
    5:  1:
   as_arrow_table.arrow_dplyr_query(x)
    6: as_arrow_table(x) 4:
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: as_arrow_table(reader)
    5: as_arrow_table.arrow_dplyr_query(x)Table__from_ExecPlanReader(self)
    2: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, data(all_of(vars_temp),     
all_of(vars_temp)), by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())(x$read_table())()
    6: as_arrow_table(x)
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: })
   tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   
    3: as_arrow_table.RecordBatchReader(reader)
    4: as_arrow_table(reader)10: tryCatch(as_arrow_table(x), error = 
function(e, call = caller_env(n = 4)) {
    5: as_arrow_table.arrow_dplyr_query(x)
    6:     augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, select(data, 
all_of(vars_temp)),     by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   as_arrow_table(x)
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, select(data, 
all_of(vars_temp)),     by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, "Doing thes"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   11: compute.arrow_dplyr_query(left_join(data, vars_temp(), by = c(id = 
"id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   
    *** caught segfault ***
   address 0x7f6ecc2520, cause 'invalid permissions'
   
   Traceback:
    1: Table__from_ExecPlanReader(self)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader)
    4: as_arrow_table(reader)
    5: as_arrow_table.arrow_dplyr_query(x)
    6: as_arrow_table(x)
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, select(data, 
all_of(vars_temp)),     by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   
    *** caught segfault ***
   address 0x7f5e1daf00, cause 'invalid permissions'
   
   Traceback:
    1: Table__from_ExecPlanReader(self)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader)
    4: as_arrow_table(reader)
    5: as_arrow_table.arrow_dplyr_query(x)
    6: as_arrow_table(x)
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, select(data, 
all_of(vars_temp)),     by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   
    *** caught segfault ***
   address 0x7ee0f0bac0, cause 'invalid permissions'
   
   Traceback:
    1: Table__from_ExecPlanReader(self)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader)
    4: as_arrow_table(reader)
    5: as_arrow_table.arrow_dplyr_query(x)
    6: as_arrow_table(x)
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, select(data, 
all_of(vars_temp)),     by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   
    *** caught segfault ***
   address 0x7ecb394000, cause 'invalid permissions'
   
   Traceback:
    1: Table__from_ExecPlanReader(self)
    2: x$read_table()
    3: as_arrow_table.RecordBatchReader(reader)
    4: as_arrow_table(reader)
    5: as_arrow_table.arrow_dplyr_query(x)
    6: as_arrow_table(x)
    7: doTryCatch(return(expr), name, parentenv, handler)
    8: tryCatchOne(expr, names, parentenv, handlers[[1L]])
    9: tryCatchList(expr, classes, parentenv, handlers)
   10: tryCatch(as_arrow_table(x), error = function(e, call = caller_env(n = 
4)) {    augment_io_error_msg(e, call, schema = schema())})
   11: compute.arrow_dplyr_query(left_join(data, select(data, 
all_of(vars_temp)),     by = c(id = "id")))
   12: compute(left_join(data, select(data, all_of(vars_temp)), by = c(id = 
"id")))
   13: FUN(X[[i]], ...)
   14: lapply(1:nb_var, function(n) {    print(paste0("Doing the join with ", n 
+ 1, " variables"))    vars_temp <- c("id", vars[1:n])    print(vars_temp)    
data_out <- compute(left_join(data, select(data, all_of(vars_temp)),         by 
= c(id = "id")))    return("Success!")})
   An irrecoverable exception occurred. R is aborting now ...
   fish: Job 1, 'Rscript acero-join-test.R' terminated by signal SIGSEGV 
(Address boundary error)
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to