I would also look at https://pandoc.org perhaps which can export a number of formats...
And for spreadsheets https://github.com/jqnatividad/qsv is my goto weapon. Can also read and write XLSX and others. A sample document or two would always be helpful... el On 29/12/2023 21:01, CALUM POLWART wrote: > It sounded like he looked at officeR but I would agree > > content <- officer::docx_summary("filename.docx") > > Would get the text content into an object called content. > > That object is a data.frame so you can then manipulate it. > To be more specific, we might need an example of the DF [...] >> On Fri, Dec 29, 2023 at 10:14 AM Andy <phaedr...@gmail.com> >> wrote: [...] >>> I'd like to be able to accomplish the following: >>> >>> (1) Append the title, the month, the author, the number of >>> words, and page number(s) to a spreadsheet >>> >>> (2) Read each article and extract keywords (in the docs, >>> these are listed in 'Subject' section as a list of >>> keywords with a percentage showing the extent to which the >>> keyword features in the article (e.g., FAST FASHION (72%)) >>> and to append the keyword and the % coverage to the same >>> row in the spreadsheet. However, I want to ensure that >>> the keyword coverage meets the threshold of >= 50%; if >>> not, then pass onto the next article in the directory. >>> Rinse and repeat for the entire directory. [...] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.