Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
On 02/05/17 19:06, Ian Monat wrote: > I could give them reasons why .exe files won't work for me but they don't > really care if I take the data files on their site or not. But do they care about their reputation? The biggest issue here is not the technical one but the security one, they could be laying themselves open to serious problems if those exe files are ever tampered with either by a "hacker" or by a disgruntled employee. Most large organizations are contemptuous of technical concerns but brand image is everything! > That said, I think my plan is to use requests to pull the .exe file down > and and then try to write a python script to extract the .zip without > running the .exe. (maybe with pandas?) I suspect not with pandas. pandas is a statistical data processing app and while it can read several file formats I doubt if self extracting zip files is one of them! I think your best6 bet is one of the PC zip archivers such as the one mentioned by Mats, specifically one with a command line interface. Then use the subprocess module in Python to run the extractor and finally unzip the 5text file either via Pythons zip module or via the extractor program again. Once you have the text file pandas may come into play to load and analyze the data. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
Hi Steven, Thanks for your commentary, made me laugh, I wish switching distributors were that easy. I could give them reasons why .exe files won't work for me but they don't really care if I take the data files on their site or not. So I guess to answer your question, we need them more. That said, I think my plan is to use requests to pull the .exe file down and and then try to write a python script to extract the .zip without running the .exe. (maybe with pandas?) I'm a beginner with python so we'll see how it goes! Thanks for your help -Ian On Tue, May 2, 2017 at 9:44 AM, Steven D'Aprano wrote: > On Mon, May 01, 2017 at 10:20:42AM -0700, Ian Monat wrote: > [...] > > Then you have you run the .exe which produces a zipped file, and inside > the > > zipped file, is the .txt, which what I really want. There's no way the > > distributor will change anything about how they store files on their > > website for me. I've written a script using the requests module but I > > think a web scraper like Scrapy, Beautiful Soup or Selinium may be > > required. > > > > What would you do? > > Find another distributor. > > (Its this sort of business to business incompetence that makes me laugh > when people say that private industry is always more efficient than the > alternatives. Did I say laugh? I meant cry.) > > Seriously, can't you tell them that your anti-virus blocks the .exe > files, and if they want you to use their system, they'll have to provide > text files as text files? > > Or tell them that you're using Apple Macs and the .exe files don't run > under Mac. > > I guess it depends on whether you need them more than they need you. > > In any case, this isn't a problem that can be solved by a web scraper. > The distributor's website provides .exe files. There's nothing you can > do about that except complain or leave. The website gives you a .exe > file, so that's what you receive. > > However, once you have the .exe file in your possession, you *may* be > able to hack open the file and extract the .zip file without running it. > That will require detailed knowledge of how the .exe file does its job, > but it is conceivable that it will work. A good low-level hacker could > probably determine whether the zip file is embedded in the .exe or if it > is generated on the fly. That's beyond my skills though. > > If it is generated on the fly, you're screwed. You have no choice but to > run the .exe, until you do the zip doesn't even exist. But if it is > embedded, it can be extracted, and once the zip file is extracted, > Python can easily unzip it. > > > > -- > Steve > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
On Mon, May 01, 2017 at 10:20:42AM -0700, Ian Monat wrote: [...] > Then you have you run the .exe which produces a zipped file, and inside the > zipped file, is the .txt, which what I really want. There's no way the > distributor will change anything about how they store files on their > website for me. I've written a script using the requests module but I > think a web scraper like Scrapy, Beautiful Soup or Selinium may be > required. > > What would you do? Find another distributor. (Its this sort of business to business incompetence that makes me laugh when people say that private industry is always more efficient than the alternatives. Did I say laugh? I meant cry.) Seriously, can't you tell them that your anti-virus blocks the .exe files, and if they want you to use their system, they'll have to provide text files as text files? Or tell them that you're using Apple Macs and the .exe files don't run under Mac. I guess it depends on whether you need them more than they need you. In any case, this isn't a problem that can be solved by a web scraper. The distributor's website provides .exe files. There's nothing you can do about that except complain or leave. The website gives you a .exe file, so that's what you receive. However, once you have the .exe file in your possession, you *may* be able to hack open the file and extract the .zip file without running it. That will require detailed knowledge of how the .exe file does its job, but it is conceivable that it will work. A good low-level hacker could probably determine whether the zip file is embedded in the .exe or if it is generated on the fly. That's beyond my skills though. If it is generated on the fly, you're screwed. You have no choice but to run the .exe, until you do the zip doesn't even exist. But if it is embedded, it can be extracted, and once the zip file is extracted, Python can easily unzip it. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
On 05/01/2017 06:12 PM, Ian Monat wrote: > Thank you for the reply Mats. > > I agree the fact that files are wrapped in an .exe is ridiculous. We're > talking about a $15B company that is doing this by the way, not a ma and pa > shop. Anyways... > > If I understand you correctly, you're saying I can: > > 1) Use Python to download the file from the web (but not by using a > webscraper, according to Alan) > 2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to > unzip the file and place the .txt file in the desired folder > > Am I understanding you correctly? > > Thank you -Ian Once you figure out the filename to download - and this may require some scraping of the page - I'm thinking something like: import subprocess downloaded_filename = "something.exe" # whatever you found to download cmd = "7z x " + downloaded_filename res = subprocess.check_output(cmd) examine res to make sure there was something to extract then go on and fish file(s) out of the zipfile I have nothing to experiment with this on, so it's just "thinking out loud". I have at some point used 7z (and most of the other windows archivers that you'd consider "third party" can probably do something like) interactively to fish a zip out of an exe, but there's no reason it wouldn't work from the command line that I know of. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
On 02/05/17 01:12, Ian Monat wrote: > 1) Use Python to download the file from the web (but not by using a > webscraper, according to Alan) Things like BeautifulSoup will help you read the HTML and extract links etc but they won't help you actually fetch the file/documents from the web site. A package like requests is the correct tool for that. > 2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to > unzip the file and place the .txt file in the desired folder I don't think Task Manager will do it but third party tools exist that can and they cope with self executing zip files too. And you can execute those programs from Python rather than directly executing the unsafe download and, if its not a legitimate zip file they will usually just issue a warning. Once you have a regular zip file the standard Python zip module can extract the text file. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
Thank you for the reply Mats. I agree the fact that files are wrapped in an .exe is ridiculous. We're talking about a $15B company that is doing this by the way, not a ma and pa shop. Anyways... If I understand you correctly, you're saying I can: 1) Use Python to download the file from the web (but not by using a webscraper, according to Alan) 2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to unzip the file and place the .txt file in the desired folder Am I understanding you correctly? Thank you -Ian On Mon, May 1, 2017 at 4:14 PM, Mats Wichmann wrote: > On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote: > > On 01/05/17 18:20, Ian Monat wrote: > >> ... I've written a script using the requests module but I > >> think a web scraper like Scrapy, Beautiful Soup or Selinium may be > >> required. > > > > I'm not sure what you are looking for. Scrapy, BS etc will > > help you read the HTML but not to fetch the file. Also do > > you want to process the file (extract the text) in Python > > too, or is it enough to just fetch the file? > > > > If the problem is with reading the HTML then you need to > > give us more detail about the problem areas and HTML > > format. > > > > If the problem is fetching the file, it sounds like you > > have already done that and it should be a case of fine > > tuning/tidying up the code you've written. > > > > What kind of help exactly are you asking for? > > > > This is a completely non-Python, non-Tutor response to part of this: > > The self-extracting archive. Convenience, at a price: running > executables of unverified reliability is just a terrible idea. > > I know you said your disty won't change their website, but you should > tell them they should: a tremendous number of organizations have > policies that don't just allow pulling down and running an exe file from > a website. Even if that's not currently the case for you, you could say > that you're not allowed, and get someone in your management chain to > promise to support that if there's a question - should not be hard. It > may be wired into the distributor's content delivery system, but that's > a stupid choice on their part. > > "Then you have you run the .exe which produces a zipped file" > > Don't do this ("run"), unless there's a way you trust to be able to > verify the security of what is offered. Just about any payload could be > buried in the exe, especially if someone broke in to the distributor's > site. > > Possibly slightly pythonic: > > if it is really just a wrapper for a zipfile (i.e. the aforementioned > self-extracting archive), you should be able to open it in 7zip or > similar, and extract the zipfile, without ever "running" it. And if > that is the case, you should be able to script extracting the zipfile > from the .exe, and then extracting the text file from the zipfile, using > Python (or other scripting languages: that's not particularly > Python-specific). > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote: > On 01/05/17 18:20, Ian Monat wrote: >> ... I've written a script using the requests module but I >> think a web scraper like Scrapy, Beautiful Soup or Selinium may be >> required. > > I'm not sure what you are looking for. Scrapy, BS etc will > help you read the HTML but not to fetch the file. Also do > you want to process the file (extract the text) in Python > too, or is it enough to just fetch the file? > > If the problem is with reading the HTML then you need to > give us more detail about the problem areas and HTML > format. > > If the problem is fetching the file, it sounds like you > have already done that and it should be a case of fine > tuning/tidying up the code you've written. > > What kind of help exactly are you asking for? > This is a completely non-Python, non-Tutor response to part of this: The self-extracting archive. Convenience, at a price: running executables of unverified reliability is just a terrible idea. I know you said your disty won't change their website, but you should tell them they should: a tremendous number of organizations have policies that don't just allow pulling down and running an exe file from a website. Even if that's not currently the case for you, you could say that you're not allowed, and get someone in your management chain to promise to support that if there's a question - should not be hard. It may be wired into the distributor's content delivery system, but that's a stupid choice on their part. "Then you have you run the .exe which produces a zipped file" Don't do this ("run"), unless there's a way you trust to be able to verify the security of what is offered. Just about any payload could be buried in the exe, especially if someone broke in to the distributor's site. Possibly slightly pythonic: if it is really just a wrapper for a zipfile (i.e. the aforementioned self-extracting archive), you should be able to open it in 7zip or similar, and extract the zipfile, without ever "running" it. And if that is the case, you should be able to script extracting the zipfile from the .exe, and then extracting the text file from the zipfile, using Python (or other scripting languages: that's not particularly Python-specific). ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
Hi Alan, thanks for the reply. My goal is to automatically via Python download the .exe, unzip it, and place the new .txt in a folder on my OneDrive. Then I have another visualization program that loads all the .txt files in that folder and displays them in a web-dashboard. My sales team has access to the dashboard through Sharepoint. So, I'm trying to automate the input to the dashboard so the team is always updated, without taking any of my time. Thanks for your time and thoughts -Ian On Mon, May 1, 2017 at 2:44 PM, Alan Gauld via Tutor wrote: > On 01/05/17 18:20, Ian Monat wrote: > > ... I've written a script using the requests module but I > > think a web scraper like Scrapy, Beautiful Soup or Selinium may be > > required. > > I'm not sure what you are looking for. Scrapy, BS etc will > help you read the HTML but not to fetch the file. Also do > you want to process the file (extract the text) in Python > too, or is it enough to just fetch the file? > > If the problem is with reading the HTML then you need to > give us more detail about the problem areas and HTML > format. > > If the problem is fetching the file, it sounds like you > have already done that and it should be a case of fine > tuning/tidying up the code you've written. > > What kind of help exactly are you asking for? > > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files
On 01/05/17 18:20, Ian Monat wrote: > ... I've written a script using the requests module but I > think a web scraper like Scrapy, Beautiful Soup or Selinium may be > required. I'm not sure what you are looking for. Scrapy, BS etc will help you read the HTML but not to fetch the file. Also do you want to process the file (extract the text) in Python too, or is it enough to just fetch the file? If the problem is with reading the HTML then you need to give us more detail about the problem areas and HTML format. If the problem is fetching the file, it sounds like you have already done that and it should be a case of fine tuning/tidying up the code you've written. What kind of help exactly are you asking for? -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor