Hello Everyone, Thanks everyone.Finally got a solution while searching things that you all had explained. There is a docx gem for parsing docx file and docx-html for convert it into HTML.
require 'docx' d = Docx::Document.open('example.docx')d.each_paragraph do |p| puts dend and for the docx file stored on s3 amazon. Docx::Document.open(open('http://S3-URL/original.docx',:ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE)) A big Thanks to All. On Sun, Sep 16, 2012 at 9:42 PM, Walter Lee Davis <wa...@wdstudio.com>wrote: > For a start, here's the man page for catdoc, which you will need to > install. > > http://linux.die.net/man/1/catdoc > > Then, read up on using the system() or backtick operators in a Ruby script > to engage it. You'll need to have a path to the file you want to process, > which is highly dependent on the system you're using to store the files. In > Paperclip, I made this processor to extract text from PDF files (pdftotext > is part of the same collection of utilities as catdoc, I believe): > > #lib/paperclip_processors/text.rb > > module Paperclip > # Handles extracting plain text from PDF file attachments > class Text < Processor > > attr_accessor :whiny > > # Creates a Text extract from PDF > def make > src = @file > dst = Tempfile.new([@basename, 'txt'].compact.join(".")) > command = <<-end_command > "#{ File.expand_path(src.path) }" > "#{ File.expand_path(dst.path) }" > end_command > > begin > success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", > command.gsub(/\s+/, " ")) > Rails.logger.info "Processing #{src.path} to #{dst.path} in the > text processor." > rescue PaperclipCommandLineError > raise PaperclipError, "There was an error processing the text for > #{@basename}" if @whiny > end > dst > end > end > end > > Depending on how you are uploading your files, your mileage may vary. At > the very simplest, the command would be > > text_contents = system('/usr/bin/catdoc /root/relative/path/to/file.doc') > > But that's hopelessly naive and will blow up on any error. > > Walter > > > On Sep 16, 2012, at 6:16 AM, rovin varshney wrote: > > > > > Hi Walter Lee Davis , Paul > > > > Please can u give some code snipet or give some more > clarification about parsing doc file. > > > > On Sat, Sep 15, 2012 at 7:37 PM, Scott Ribe <scott_r...@elevated-dev.com> > wrote: > > On Sep 15, 2012, at 7:27 AM, Paul wrote: > > > > > The docx format is actually pretty simple... > > > > You are really cruel to toy with him like that ;-) > > > > > > -- > > Scott Ribe > > scott_r...@elevated-dev.com > > http://www.elevated-dev.com/ > > (303) 722-0567 voice > > > > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > To post to this group, send email to rubyonrails-talk@googlegroups.com. > > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > To post to this group, send email to rubyonrails-talk@googlegroups.com. > > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk@googlegroups.com. > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-talk+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.