Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-02 Thread Alan Gauld via Tutor
On 02/05/17 19:06, Ian Monat wrote:

> I could give them reasons why .exe files won't work for me but they don't
> really care if I take the data files on their site or not. 

But do they care about their reputation?
The biggest issue here is not the technical one but the security
one, they could be laying themselves open to serious problems
if those exe files are ever tampered with either by a "hacker"
or by a disgruntled employee.

Most large organizations are contemptuous of technical concerns
but brand image is everything!

> That said, I think my plan is to use requests to pull the .exe file down
> and and then try to write a python script to extract the .zip without
> running the .exe. (maybe with pandas?) 

I suspect not with pandas. pandas is a statistical data processing
app and while it can read several file formats I doubt if self
extracting zip files is one of them! I think your best6 bet
is one of the PC zip archivers such as the one mentioned by
Mats, specifically one with a command line interface.

Then use the subprocess module in Python to run the extractor
and finally unzip the 5text file either via Pythons zip module
or via the extractor program again.

Once you have the text file pandas may come into play to load
and analyze the data.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-02 Thread Ian Monat
Hi Steven,

Thanks for your commentary, made me laugh, I wish switching distributors
were that easy.

I could give them reasons why .exe files won't work for me but they don't
really care if I take the data files on their site or not. So I guess to
answer your question, we need them more.

That said, I think my plan is to use requests to pull the .exe file down
and and then try to write a python script to extract the .zip without
running the .exe. (maybe with pandas?) I'm a beginner with python so we'll
see how it goes!

Thanks for your help -Ian

On Tue, May 2, 2017 at 9:44 AM, Steven D'Aprano  wrote:

> On Mon, May 01, 2017 at 10:20:42AM -0700, Ian Monat wrote:
> [...]
> > Then you have you run the .exe which produces a zipped file, and inside
> the
> > zipped file, is the .txt, which what I really want. There's no way the
> > distributor will change anything about how they store files on their
> > website for me.  I've written a script using the requests module but I
> > think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> > required.
> >
> > What would you do?
>
> Find another distributor.
>
> (Its this sort of business to business incompetence that makes me laugh
> when people say that private industry is always more efficient than the
> alternatives. Did I say laugh? I meant cry.)
>
> Seriously, can't you tell them that your anti-virus blocks the .exe
> files, and if they want you to use their system, they'll have to provide
> text files as text files?
>
> Or tell them that you're using Apple Macs and the .exe files don't run
> under Mac.
>
> I guess it depends on whether you need them more than they need you.
>
> In any case, this isn't a problem that can be solved by a web scraper.
> The distributor's website provides .exe files. There's nothing you can
> do about that except complain or leave. The website gives you a .exe
> file, so that's what you receive.
>
> However, once you have the .exe file in your possession, you *may* be
> able to hack open the file and extract the .zip file without running it.
> That will require detailed knowledge of how the .exe file does its job,
> but it is conceivable that it will work. A good low-level hacker could
> probably determine whether the zip file is embedded in the .exe or if it
> is generated on the fly. That's beyond my skills though.
>
> If it is generated on the fly, you're screwed. You have no choice but to
> run the .exe, until you do the zip doesn't even exist. But if it is
> embedded, it can be extracted, and once the zip file is extracted,
> Python can easily unzip it.
>
>
>
> --
> Steve
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-02 Thread Steven D'Aprano
On Mon, May 01, 2017 at 10:20:42AM -0700, Ian Monat wrote:
[...]
> Then you have you run the .exe which produces a zipped file, and inside the
> zipped file, is the .txt, which what I really want. There's no way the
> distributor will change anything about how they store files on their
> website for me.  I've written a script using the requests module but I
> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> required.
> 
> What would you do?

Find another distributor.

(Its this sort of business to business incompetence that makes me laugh 
when people say that private industry is always more efficient than the 
alternatives. Did I say laugh? I meant cry.)

Seriously, can't you tell them that your anti-virus blocks the .exe 
files, and if they want you to use their system, they'll have to provide 
text files as text files?

Or tell them that you're using Apple Macs and the .exe files don't run 
under Mac.

I guess it depends on whether you need them more than they need you.

In any case, this isn't a problem that can be solved by a web scraper. 
The distributor's website provides .exe files. There's nothing you can 
do about that except complain or leave. The website gives you a .exe 
file, so that's what you receive.

However, once you have the .exe file in your possession, you *may* be 
able to hack open the file and extract the .zip file without running it. 
That will require detailed knowledge of how the .exe file does its job, 
but it is conceivable that it will work. A good low-level hacker could 
probably determine whether the zip file is embedded in the .exe or if it 
is generated on the fly. That's beyond my skills though.

If it is generated on the fly, you're screwed. You have no choice but to 
run the .exe, until you do the zip doesn't even exist. But if it is 
embedded, it can be extracted, and once the zip file is extracted, 
Python can easily unzip it.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-02 Thread Mats Wichmann
On 05/01/2017 06:12 PM, Ian Monat wrote:
> Thank you for the reply Mats.
> 
> I agree the fact that files are wrapped in an .exe is ridiculous. We're
> talking about a $15B company that is doing this by the way, not a ma and pa
> shop.  Anyways...
> 
> If I understand you correctly, you're saying I can:
> 
> 1) Use Python to download the file from the web (but not by using a
> webscraper, according to Alan)
> 2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to
> unzip the file and place the .txt file in the desired folder
> 
> Am I understanding you correctly?
> 
> Thank you -Ian

Once you figure out the filename to download - and this may require some
scraping of the page - I'm thinking something like:

import subprocess

downloaded_filename = "something.exe"  # whatever you found to download
cmd = "7z x " + downloaded_filename
res = subprocess.check_output(cmd)

examine res to make sure there was something to extract
then go on and fish file(s) out of the zipfile

I have nothing to experiment with this on, so it's just "thinking out loud".

I have at some point used 7z (and most of the other windows archivers
that you'd consider "third party" can probably do something like)
interactively to fish a zip out of an exe, but there's no reason it
wouldn't work from the command line that I know of.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-01 Thread Alan Gauld via Tutor
On 02/05/17 01:12, Ian Monat wrote:

> 1) Use Python to download the file from the web (but not by using a
> webscraper, according to Alan)

Things like BeautifulSoup will help you read the HTML and
extract links etc but they won't help you actually fetch
the file/documents from the web site. A package like
requests is the correct tool for that.

> 2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to
> unzip the file and place the .txt file in the desired folder

I don't think Task Manager will do it but third party tools
exist that can and they cope with self executing zip files
too. And you can execute those programs from Python rather
than directly executing the unsafe download and, if its not
a legitimate zip file they will usually just issue a warning.

Once you have a regular zip file the standard Python zip module
can extract the text file.


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-01 Thread Ian Monat
Thank you for the reply Mats.

I agree the fact that files are wrapped in an .exe is ridiculous. We're
talking about a $15B company that is doing this by the way, not a ma and pa
shop.  Anyways...

If I understand you correctly, you're saying I can:

1) Use Python to download the file from the web (but not by using a
webscraper, according to Alan)
2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to
unzip the file and place the .txt file in the desired folder

Am I understanding you correctly?

Thank you -Ian

On Mon, May 1, 2017 at 4:14 PM, Mats Wichmann  wrote:

> On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote:
> > On 01/05/17 18:20, Ian Monat wrote:
> >> ...  I've written a script using the requests module but I
> >> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> >> required.
> >
> > I'm not sure what you are looking for. Scrapy, BS etc will
> > help you read the HTML but not to fetch the file. Also do
> > you want to process the file (extract the text) in Python
> > too, or is it enough to just fetch the file?
> >
> > If the problem is with reading the HTML then you need to
> > give us more detail about the problem areas and HTML
> > format.
> >
> > If the problem is fetching the file, it sounds like you
> > have already done that and it should be a case of fine
> > tuning/tidying up the code you've written.
> >
> > What kind of help exactly are you asking for?
> >
>
> This is a completely non-Python, non-Tutor response to part of this:
>
> The self-extracting archive. Convenience, at a price: running
> executables of unverified reliability is just a terrible idea.
>
> I know you said your disty won't change their website, but you should
> tell them they should: a tremendous number of organizations have
> policies that don't just allow pulling down and running an exe file from
> a website. Even if that's not currently the case for you, you could say
> that you're not allowed, and get someone in your management chain to
> promise to support that if there's a question - should not be hard. It
> may be wired into the distributor's content delivery system, but that's
> a stupid choice on their part.
>
> "Then you have you run the .exe which produces a zipped file"
>
> Don't do this ("run"), unless there's a way you trust to be able to
> verify the security of what is offered. Just about any payload could be
> buried in the exe, especially if someone broke in to the distributor's
> site.
>
> Possibly slightly pythonic:
>
> if it is really just a wrapper for a zipfile (i.e. the aforementioned
> self-extracting archive), you should be able to open it in 7zip or
> similar, and extract the zipfile, without ever "running" it.  And if
> that is the case, you should be able to script extracting the zipfile
> from the .exe, and then extracting the text file from the zipfile, using
> Python (or other scripting languages: that's not particularly
> Python-specific).
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-01 Thread Mats Wichmann
On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote:
> On 01/05/17 18:20, Ian Monat wrote:
>> ...  I've written a script using the requests module but I
>> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
>> required.
> 
> I'm not sure what you are looking for. Scrapy, BS etc will
> help you read the HTML but not to fetch the file. Also do
> you want to process the file (extract the text) in Python
> too, or is it enough to just fetch the file?
> 
> If the problem is with reading the HTML then you need to
> give us more detail about the problem areas and HTML
> format.
> 
> If the problem is fetching the file, it sounds like you
> have already done that and it should be a case of fine
> tuning/tidying up the code you've written.
> 
> What kind of help exactly are you asking for?
> 

This is a completely non-Python, non-Tutor response to part of this:

The self-extracting archive. Convenience, at a price: running
executables of unverified reliability is just a terrible idea.

I know you said your disty won't change their website, but you should
tell them they should: a tremendous number of organizations have
policies that don't just allow pulling down and running an exe file from
a website. Even if that's not currently the case for you, you could say
that you're not allowed, and get someone in your management chain to
promise to support that if there's a question - should not be hard. It
may be wired into the distributor's content delivery system, but that's
a stupid choice on their part.

"Then you have you run the .exe which produces a zipped file"

Don't do this ("run"), unless there's a way you trust to be able to
verify the security of what is offered. Just about any payload could be
buried in the exe, especially if someone broke in to the distributor's site.

Possibly slightly pythonic:

if it is really just a wrapper for a zipfile (i.e. the aforementioned
self-extracting archive), you should be able to open it in 7zip or
similar, and extract the zipfile, without ever "running" it.  And if
that is the case, you should be able to script extracting the zipfile
from the .exe, and then extracting the text file from the zipfile, using
Python (or other scripting languages: that's not particularly
Python-specific).
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-01 Thread Ian Monat
Hi Alan, thanks for the reply.

My goal is to automatically via Python download the .exe, unzip it, and
place the new .txt in a folder on my OneDrive.

Then I have another visualization program that loads all the .txt files in
that folder and displays them in a web-dashboard.

My sales team has access to the dashboard through Sharepoint. So, I'm
trying to automate the input to the dashboard so the team is always
updated, without taking any of my time.

Thanks for your time and thoughts -Ian

On Mon, May 1, 2017 at 2:44 PM, Alan Gauld via Tutor 
wrote:

> On 01/05/17 18:20, Ian Monat wrote:
> > ...  I've written a script using the requests module but I
> > think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> > required.
>
> I'm not sure what you are looking for. Scrapy, BS etc will
> help you read the HTML but not to fetch the file. Also do
> you want to process the file (extract the text) in Python
> too, or is it enough to just fetch the file?
>
> If the problem is with reading the HTML then you need to
> give us more detail about the problem areas and HTML
> format.
>
> If the problem is fetching the file, it sounds like you
> have already done that and it should be a case of fine
> tuning/tidying up the code you've written.
>
> What kind of help exactly are you asking for?
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using Python to access .txt files stored behind a firewall as .exe files

2017-05-01 Thread Alan Gauld via Tutor
On 01/05/17 18:20, Ian Monat wrote:
> ...  I've written a script using the requests module but I
> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> required.

I'm not sure what you are looking for. Scrapy, BS etc will
help you read the HTML but not to fetch the file. Also do
you want to process the file (extract the text) in Python
too, or is it enough to just fetch the file?

If the problem is with reading the HTML then you need to
give us more detail about the problem areas and HTML
format.

If the problem is fetching the file, it sounds like you
have already done that and it should be a case of fine
tuning/tidying up the code you've written.

What kind of help exactly are you asking for?

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor