[ https://issues.apache.org/jira/browse/ARROW-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173740#comment-17173740 ]
Nick DiQuattro edited comment on ARROW-9676 at 8/9/20, 3:54 AM: ---------------------------------------------------------------- Sure, not sure how to share a reprex of this exactly, but here's how the offending column is displayed after calling `open_dataset()`: {{ address_cass: struct<analysis: struct<active: string, dpv_cmra: string, dpv_footnotes: string, dpv_match_code: string, dpv_vacant: string, footnotes: string, lacslink_code: string, lacslink_indicator: string>, canidate_index: int64, components: struct<city_name: string, default_city_name: string, delivery_point: string, delivery_point_check_digit: string, extra_secondary_designator: string, extra_secondary_number: string, plus4_code: string, pmb_designator: string, pmb_number: string, primary_number: string, secondary_designator: string, secondary_number: string, state_abbreviation: string, street_name: string, street_postdirection: string, street_predirection: string, street_suffix: string, zipcode: string>, delivery_line_1: string, delivery_line_2: string, delivery_point_barcode: string, input_index: int64, last_line: string, metadata: struct<building_default_indicator: string, carrier_route: string, congressional_district: string, county_fips: string, county_name: string, dst: bool, elot_sequence: string, latitude: double, longitude: double, precision: string, rdi: string, record_type: string, time_zone: string, utc_offset: string, zip_type: string>>}} For the crash, after calling `collect()` it hangs until a "Previous R session was abnormally terminated" pop-up comes up. In trying it out again for this update, I tried collecting without any select functions (opends() %>% collect()) and these errors showed up before the crash: {{ Error in Table__to_dataframe(x, use_threads = option_use_threads()) : }} {{ Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597002227520 is too close to the limitError: no more error handlers available (recursive errors?); invoking 'abort' restartError: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597019012928 is too close to the limitError: no more error handlers available (recursive errors?); invoking 'abort' restartWarning: stack imbalance in '.Call', 29 then 30Warning: stack imbalance in '{', 25 then 26Warning: stack imbalance in 'if', 23 then 20Warning: stack imbalance in 'Unable to render embedded object: File ( Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597010620000 is too close to the limitError: no more error handlers available (recursive errors?); invoking 'abort' restartWarning: stack imbalance in ') not found.', 15 then 17}} Happy to provide any more information, thanks! was (Author: ndiquattro): Sure, not sure how to share a reprex of this exactly, but here's how the offending column is displayed after calling `open_dataset()`: address_cass: struct<analysis: struct<active: string, dpv_cmra: string, dpv_footnotes: string, dpv_match_code: string, dpv_vacant: string, footnotes: string, lacslink_code: string, lacslink_indicator: string>, canidate_index: int64, components: struct<city_name: string, default_city_name: string, delivery_point: string, delivery_point_check_digit: string, extra_secondary_designator: string, extra_secondary_number: string, plus4_code: string, pmb_designator: string, pmb_number: string, primary_number: string, secondary_designator: string, secondary_number: string, state_abbreviation: string, street_name: string, street_postdirection: string, street_predirection: string, street_suffix: string, zipcode: string>, delivery_line_1: string, delivery_line_2: string, delivery_point_barcode: string, input_index: int64, last_line: string, metadata: struct<building_default_indicator: string, carrier_route: string, congressional_district: string, county_fips: string, county_name: string, dst: bool, elot_sequence: string, latitude: double, longitude: double, precision: string, rdi: string, record_type: string, time_zone: string, utc_offset: string, zip_type: string>> For the crash, after calling `collect()` it hangs until a "Previous R session was abnormally terminated" pop-up comes up. In trying it out again for this update, I tried collecting without any select functions (opends() %>% collect()) and these errors showed up before the crash: Error in Table__to_dataframe(x, use_threads = option_use_threads()) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597002227520 is too close to the limitError: no more error handlers available (recursive errors?); invoking 'abort' restartError: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597019012928 is too close to the limitError: no more error handlers available (recursive errors?); invoking 'abort' restartWarning: stack imbalance in '.Call', 29 then 30Warning: stack imbalance in '{', 25 then 26Warning: stack imbalance in 'if', 23 then 20Warning: stack imbalance in '!', 22 then 19Error: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'Error during wrapup: C stack usage 597010620000 is too close to the limitError: no more error handlers available (recursive errors?); invoking 'abort' restartWarning: stack imbalance in '!', 15 then 17 Happy to provide any more information, thanks! > [R] Option to import structs as lists > ------------------------------------- > > Key: ARROW-9676 > URL: https://issues.apache.org/jira/browse/ARROW-9676 > Project: Apache Arrow > Issue Type: New Feature > Components: R > Affects Versions: 1.0.0 > Environment: Amazon Linux, 32gb of ram > Reporter: Nick DiQuattro > Priority: Major > > When trying to collect data from a dataset based on parquet files with nested > structs (column is a struct with 2 structs nested) of moderate size (1Mish > rows), R crashes. If I add a filter to reduce the number of rows, the data is > parsed. If I select out the struct column, it works great (up to 21M rows). > My hunch is the structs resulting in data.frame columns may be the issue. I > am curious if there's a way to have arrow import structs as lists instead of > data.frames. Thanks for the direction to hereĀ [~neilr8133]! -- This message was sent by Atlassian Jira (v8.3.4#803005)