Please suggest any efficient way/code of splitting text into sentences in R.
Currently, I'm using openNLP library for the same, it is taking several hours 
to process 8,000+ records of twitter post/comments.

Below is my R code for same:

options(java.parameters = "-Xmx4g")

library("NLP"); library("openNLPdata"); library("openNLP")
sentence_token_annotator <- Maxent_Sent_Token_Annotator()
convert_text_to_sentences <- function(text) {
text <- as.String(text)
sentence.boundaries <- annotate(text, sentence_token_annotator)
sentences <- text[sentence.boundaries]
return(sentences)
}

system.time(textofcomment_list <- lapply(data_all$TEXT, 
convert_text_to_sentences))

Thanks in advance

Disclaimer: "The materials contained in this email and any attachments may 
contain confidential or legally privileged information. The information 
contained in this communication is intended solely for the use of the 
individual or entity to whom it is addressed and others authorized to receive 
it. If you are not the intended recipient you are hereby notified that any 
disclosure, copying, distribution or taking any action in reliance on the 
contents of this information is strictly prohibited and may be unlawful. If you 
have received this communication in error, please notify us immediately by 
responding to this email and then delete it from your system. Sonata is neither 
liable for the proper and complete transmission of the information contained in 
this communication nor for any delay in its receipt"




--
View this message in context: 
http://r.789695.n4.nabble.com/Sentence-Splitting-using-R-s-openNLP-library-is-not-efficient-tp4696694.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to