On 5/17/2015 10:17 AM, Thomas 'PointedEars' Lahn wrote:
C.D. Reimer wrote:

Consider using a regular expression or the urllib object instead. See RFC 3986, Appendix B, and <https://docs.python.org/3/library/urllib.html>, respectively.

That wouldn't work for me. I'm in the process of converting a WordPress website into a static website.

I wrote a script that pulled the HTML content from the SQL file to save each post in a text file with the URL as the file name (i.e., "2015-01-01-this-is-a-slug.html"). That created 275 files in the source folder.

Since I'm using Grav CMS (http://getgrav.org/) for my static website, I wrote a script to get the file names from the source folder, slice each file name into their respective component (i.e., year, month, day, slug, and title from the slug), convert the HTML into Markdown, and copy the content into a file called item.md inside a new folder (i.e., 20150101.this-is-a-slug) in the destination folder.

After I get done cleaning up 275 item.md files in a Markdown editor, I'll write another script to create an .htaccess file to forward old url (i.e., /2015/01/01/this-is-a-slug) to the new URL (i.e., /blog/this-is-a-slug).

Gotta love string manipulations. ;)

Thank you,

Chris Reimer
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to