Re: [Tutor] How to Scrape Text from PDFs
On 17/06/2019 06:30, Cem Vardar wrote: > some PDF files that have links for some websites and I need to extract these > links There is a module that may help: PyPDF2 Here is a post showing how to extract the text from a PDF which should include the links. https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file There may even be more specific extraction tools if you look more closely... -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Installing Python v3 on a laptop Windows 10 (SOLVED)
On 15/06/2019 22:23, Ken Green wrote: I understood there is a preferable method of installing Python into Windows. I pray tell on how about to do it, gentlemen. Thank you gentlemen for the prompt responses to my inquiry. I believe it would be best for me to use the ActiveState installation for my laptop. I like Microsoft trying to make it easily to download Python but I am not sure if it has been fully implemented yet. Again, thanks guys. Ken Green ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to Scrape Text from PDFs
> On Jun 17, 2019, at 1:30 AM, Cem Vardar wrote: > > Hello, > > I have been working on assignment that was described to me as “fairly > trivial” for a couple of days now. I have some PDF files that have links for > some websites and I need to extract these links from these files by using > Python. I would be very glad if someone could point me in the direction of > some resources that would give me the essential skills specific for this task. > Unfortunately, a PDF can contain anything from almost PostScript to a bit map. But lets assume your PDFs are of the almost PostScript flavor. In that case you can simply read them as text, and then use standard Python’s standard string searching for http:// or https://. Each time you find one, stop and parse (again with string handling) the URL looking for one of the typical terminators (e.g. .com, .net, .org etc.). It might help to cheat a bit and open one of the PDFs with a standard text editor and using it, search for http:// and see what turns up. I’ll bet it will be fairly clear. Bill > Sincerely, > Cem > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] How to Scrape Text from PDFs
Hello, I have been working on assignment that was described to me as “fairly trivial” for a couple of days now. I have some PDF files that have links for some websites and I need to extract these links from these files by using Python. I would be very glad if someone could point me in the direction of some resources that would give me the essential skills specific for this task. Sincerely, Cem ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor