[jira] [Comment Edited] (ARROW-9676) [R] Option to import structs as lists

Nick DiQuattro (Jira) Sat, 08 Aug 2020 20:55:51 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173740#comment-17173740
 ]


Nick DiQuattro edited comment on ARROW-9676 at 8/9/20, 3:54 AM:
----------------------------------------------------------------

Sure, not sure how to share a reprex of this exactly, but here's how the 
offending column is displayed after calling `open_dataset()`:


{{ address_cass: struct<analysis: struct<active: string, dpv_cmra: string, 
dpv_footnotes: string, dpv_match_code: string, dpv_vacant: string, footnotes: 
string, lacslink_code: string, lacslink_indicator: string>, canidate_index: 
int64, components: struct<city_name: string, default_city_name: string, 
delivery_point: string, delivery_point_check_digit: string, 
extra_secondary_designator: string, extra_secondary_number: string, plus4_code: 
string, pmb_designator: string, pmb_number: string, primary_number: string, 
secondary_designator: string, secondary_number: string, state_abbreviation: 
string, street_name: string, street_postdirection: string, street_predirection: 
string, street_suffix: string, zipcode: string>, delivery_line_1: string, 
delivery_line_2: string, delivery_point_barcode: string, input_index: int64, 
last_line: string, metadata: struct<building_default_indicator: string, 
carrier_route: string, congressional_district: string, county_fips: string, 
county_name: string, dst: bool, elot_sequence: string, latitude: double, 
longitude: double, precision: string, rdi: string, record_type: string, 
time_zone: string, utc_offset: string, zip_type: string>>}}


 For the crash, after calling `collect()` it hangs until a "Previous R session 
was abnormally terminated" pop-up comes up.

In trying it out again for this update, I tried collecting without any select 
functions (opends() %>% collect()) and these errors showed up before the crash:


{{ Error in Table__to_dataframe(x, use_threads = option_use_threads()) : }}
{{ Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during 
wrapup: C stack usage 597002227520 is too close to the limitError: no more 
error handlers available (recursive errors?); invoking 'abort' restartError: 
Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: 
C stack usage 597019012928 is too close to the limitError: no more error 
handlers available (recursive errors?); invoking 'abort' restartWarning: stack 
imbalance in '.Call', 29 then 30Warning: stack imbalance in '{', 25 then 
26Warning: stack imbalance in 'if', 23 then 20Warning: stack imbalance in 
'Unable to render embedded object: File ( Value of SET_STRING_ELT() must be a 
'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597010620000 is too 
close to the limitError: no more error handlers available (recursive errors?); 
invoking 'abort' restartWarning: stack imbalance in ') not found.', 15 then 17}}


 Happy to provide any more information, thanks!


was (Author: ndiquattro):
Sure, not sure how to share a reprex of this exactly, but here's how the 
offending column is displayed after calling `open_dataset()`:
address_cass: struct<analysis: struct<active: string, dpv_cmra: string, 
dpv_footnotes: string, dpv_match_code: string, dpv_vacant: string, footnotes: 
string, lacslink_code: string, lacslink_indicator: string>, canidate_index: 
int64, components: struct<city_name: string, default_city_name: string, 
delivery_point: string, delivery_point_check_digit: string, 
extra_secondary_designator: string, extra_secondary_number: string, plus4_code: 
string, pmb_designator: string, pmb_number: string, primary_number: string, 
secondary_designator: string, secondary_number: string, state_abbreviation: 
string, street_name: string, street_postdirection: string, street_predirection: 
string, street_suffix: string, zipcode: string>, delivery_line_1: string, 
delivery_line_2: string, delivery_point_barcode: string, input_index: int64, 
last_line: string, metadata: struct<building_default_indicator: string, 
carrier_route: string, congressional_district: string, county_fips: string, 
county_name: string, dst: bool, elot_sequence: string, latitude: double, 
longitude: double, precision: string, rdi: string, record_type: string, 
time_zone: string, utc_offset: string, zip_type: string>>
For the crash, after calling `collect()` it hangs until a "Previous R session 
was abnormally terminated" pop-up comes up.

In trying it out again for this update, I tried collecting without any select 
functions (opends() %>% collect()) and these errors showed up before the crash:
Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
  Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during 
wrapup: C stack usage  597002227520 is too close to the limitError: no more 
error handlers available (recursive errors?); invoking 'abort' restartError: 
Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: 
C stack usage  597019012928 is too close to the limitError: no more error 
handlers available (recursive errors?); invoking 'abort' restartWarning: stack 
imbalance in '.Call', 29 then 30Warning: stack imbalance in '{', 25 then 
26Warning: stack imbalance in 'if', 23 then 20Warning: stack imbalance in '!', 
22 then 19Error: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 
'NULL'Error during wrapup: C stack usage  597010620000 is too close to the 
limitError: no more error handlers available (recursive errors?); invoking 
'abort' restartWarning: stack imbalance in '!', 15 then 17
Happy to provide any more information, thanks!

> [R] Option to import structs as lists
> -------------------------------------
>
>                 Key: ARROW-9676
>                 URL: https://issues.apache.org/jira/browse/ARROW-9676
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>    Affects Versions: 1.0.0
>         Environment: Amazon Linux, 32gb of ram
>            Reporter: Nick DiQuattro
>            Priority: Major
>
> When trying to collect data from a dataset based on parquet files with nested 
> structs (column is a struct with 2 structs nested) of moderate size (1Mish 
> rows), R crashes. If I add a filter to reduce the number of rows, the data is 
> parsed. If I select out the struct column, it works great (up to 21M rows). 
> My hunch is the structs resulting in data.frame columns may be the issue. I 
> am curious if there's a way to have arrow import structs as lists instead of 
> data.frames. Thanks for the direction to here [~neilr8133]!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-9676) [R] Option to import structs as lists

Reply via email to