N Gautam Animesh created ARROW-17802: ----------------------------------------
Summary: Merging multi file datasets on particular columns that are present in all the datasets. Key: ARROW-17802 URL: https://issues.apache.org/jira/browse/ARROW-17802 Project: Apache Arrow Issue Type: Improvement Reporter: N Gautam Animesh While working with multi file datasets, I came across an issue where I wanted to merge specific columns from all the datasets and work on them. Though I was not able to do so, I want to know whether there is any work around for merging multi file datasets around some specific columns? Please look into it and do let me know if there's anything regarding this. {code:java} system.time({ df <- open_dataset('C:/Test/Files/test', format = "arrow") df <- df %>% collect() %>% #merging logic so as to select only specified column(s) #write_dataset(df, 'C:/Test/Files/test', format = "arrow") }) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)