2009/6/9 Daniel Underwood djuatde...@gmail.com:
When I enter:
$ find *.pdf -print0 | xargs -0 pdftotext
nothing seems to happen. Although there is no error message, the text
files are not created. Any idea why?
Ah, apologies. I was just testing with
$ find *.pdf -print0 | xargs -0 cat
to
Hmm.. The command
find *.pdf -exec pdftotext {} \;
works in directories in which no PDF file returns the Document has
not the mandatory ending %EOF error. When a directory contains one
of these files, none of the files get converted. Is there some way to
ignore or skip over this %EOF problem
On Mon, Jun 08, 2009 at 05:17:29PM -0400, Daniel Underwood wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local
On Mon, 8 Jun 2009 22:37:01 -0400 (EDT), vogelke+u...@pobox.com (Karl Vogel)
wrote:
Are these PDF files generated by scanning journal pages, or do they
contain text? If the latter, you could use something like xapian or
hyperestraier to make a full-text index of your files.
On a
On Mon, 8 Jun 2009 23:11:50 -0400, Daniel Underwood djuatde...@gmail.com
wrote:
Since all the PDFs contain text (none are scanned images), can I
simply use some command like grep to search for text within the
collection? If so, how would I do this? Can grep read text from
within PDFs?
I
Le 8 juin 09 à 23:17, Daniel Underwood a écrit :
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local drive.
Hi Daniel,
I
I'm trying to convert all PDF files in a directory to text using
pdftotext. I tried the following command:
$ find *.pdf | xargs -0 pdftotext
Error: Couldn't open file 'Ross-JAMA-2007 (Prostate Screening Strategies).pdf
Sanda-JAMA-2009 (Prostate Cancer Treatment).pdf
'
Why is this not working?
On Jun 8, 2009, at 5:17 PM, Daniel Underwood wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local drive.
In the
2009/6/9 Daniel Underwood djuatde...@gmail.com:
I'm trying to convert all PDF files in a directory to text using
pdftotext. I tried the following command:
$ find *.pdf | xargs -0 pdftotext
Error: Couldn't open file 'Ross-JAMA-2007 (Prostate Screening Strategies).pdf
Sanda-JAMA-2009
ill...@gmail.com wrote:
2009/6/9 Daniel Underwood djuatde...@gmail.com:
I'm trying to convert all PDF files in a directory to text using
pdftotext. I tried the following command:
$ find *.pdf | xargs -0 pdftotext
Error: Couldn't open file 'Ross-JAMA-2007 (Prostate Screening Strategies).pdf
$ find *.pdf -exec pdftotext {} \;
Error: Document has not the mandatory ending %EOF
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to
Daniel Underwood wrote:
$ find *.pdf -exec pdftotext {} \;
Error: Document has not the mandatory ending %EOF
Have you run pdftotext on a single file in your archive as a test?
--Joseph Lenox
___
freebsd-questions@freebsd.org mailing list
Hmm.. The command
find *.pdf -exec pdftotext {} \;
works in directories in which no PDF file returns the Document has
not the mandatory ending %EOF error. When a directory contains one
of these files, none of the files get converted. Is there some way to
ignore or skip over this %EOF problem
Daniel Underwood wrote:
Yes, it works fine on most PDFs. There are a couple that give me:
$ pdftotext Sanda-JAMA-2009\ \(Prostate\ Cancer\ Treatment\).pdf
Error: Document has not the mandatory ending %EOF
It's probably an issue with the PDF itself, not with the program.
--Joseph Lenox
On Tue, 09 Jun 2009 16:07:03 -0500, LoH lordofhyph...@gmail.com wrote:
Daniel Underwood wrote:
Yes, it works fine on most PDFs. There are a couple that give me:
$ pdftotext Sanda-JAMA-2009\ \(Prostate\ Cancer\ Treatment\).pdf
Error: Document has not the mandatory ending %EOF
I retrieved a fresh copy of the error-causing PDF, and now all is
well. Thanks for all the excellent help!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to
Daniel,
I'm trying to convert all PDF files in a directory to text using
pdftotext. I tried the following command:
Aside from the syntax of the command find(1) and some article that may
be in corrupted PDF, you may consider hacking pdftotext to skip the
do not print flag in some of the PDF
Daniel Underwood wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local drive.
In the course of reading literature for
On Mon, 8 Jun 2009 17:17:29 -0400, Daniel Underwood djuatde...@gmail.com
wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local drive.
In the course of reading literature for research, it often happens
Poly and LoH: Thanks, these are great ideas!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
On Mon, Jun 08, 2009 at 05:17:29PM -0400, Daniel Underwood wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local
Daniel Underwood djuatde...@gmail.com wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local drive.
In the course
On Mon, 8 Jun 2009 17:45:38 -0400, Daniel Underwood djuatde...@gmail.com
wrote:
Poly and LoH: Thanks, these are great ideas!
I'd like to add that if you define your data fields well, you
can use it to generate BibTeX and other LaTeX entries from your
records.
You can even easily turn it into
On Mon, Jun 8, 2009 at 10:17 PM, Daniel Underwooddjuatde...@gmail.com wrote:
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my
Hi,
I'm looking for a way to manage my personal collection of research
articles. Ideally I'd like some way to keep records on authors,
keywords, journals, and publication years of articles (PDF files)
downloaded onto my local drive.
Certainly overkill, but dspace(.org) can keep up a digital
On Mon, 8 Jun 2009 17:17:29 -0400,
Daniel Underwood djuatde...@gmail.com said:
D In the course of reading literature for research, it often happens that I
D find myself wanted to return to something I have previously read, but I
D only recall a few things about the article, often the author
Since all the PDFs contain text (none are scanned images), can I
simply use some command like grep to search for text within the
collection? If so, how would I do this? Can grep read text from
within PDFs?
___
freebsd-questions@freebsd.org mailing list
Since all the PDFs contain text (none are scanned images), can I
simply use some command like grep to search for text within the
collection? If so, how would I do this? Can grep read text from
within PDFs?
pdftotext, comes with the port xpdf I think
Olivier
On Mon, Jun 8, 2009 at 10:21 PM, Olivier Nicole o...@cs.ait.ac.th wrote:
Since all the PDFs contain text (none are scanned images), can I
simply use some command like grep to search for text within the
collection? If so, how would I do this? Can grep read text from
within PDFs?
Daniel Underwood wrote:
A partial solution would also to do a search on someone else's index (google
scholar, IEEEXplore, etc) to get the title of what you're looking for.
True, but in this situation, I want to find something within a local
collection of literature. E.g., find a table of
31 matches
Mail list logo