N Gautam Animesh created ARROW-17802:
----------------------------------------

             Summary: Merging multi file datasets on particular columns that 
are present in all the datasets.
                 Key: ARROW-17802
                 URL: https://issues.apache.org/jira/browse/ARROW-17802
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: N Gautam Animesh


While working with multi file datasets, I came across an issue where I wanted 
to merge specific columns from all the datasets and work on them.
Though I was not able to do so, I want to know whether there is any work around 
for merging multi file datasets around some specific columns?
Please look into it and do let me know if there's anything regarding this.
{code:java}
system.time({
  df <- open_dataset('C:/Test/Files/test', format = "arrow")
  df <- df %>% collect() %>%
  #merging logic so as to select only specified column(s)
  #write_dataset(df, 'C:/Test/Files/test', format = "arrow")
}) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to