Re: [Bug-wget] Fwd: Regarding wget to download webpage

2012-01-24 Thread Henrik Holst
I looked at your web site and it does not perform standard http
authentication so --username and --password cannot be used to logon to
that page.

You have to supply the username and password using --post-data there
aswell. If you had followed my advice to use the Live Headers extension
with Firefox you would have seen exactly what to do, please use that
tool aswell as learn some http basics and you will soon learn how to
perform what you want. Because since we do not have access to that site
of yours (no username or password) we as a community will have quite
some hard time to tell you exactly how to proceed since we cannot test
things at our end.

Anyways, as I wrote I tested to perform a logon attempt with the site
and with Live HTTP Headers extension active I could see that the
authentication should be performed like this:

wget --post-data detour=https%3A%2F%2Fwww.collabnet.timeinc.net%
2FloginID=usernamepassword=passwordLogin=Login --save-cookies
cookies.txt https://www.collabnet.timeinc.net/servlets/TLogin;

Replace the username and password in the post data with your account
details.

However since I have no account on that site of yours, I do not know if
this really works and whether the detour=xxx thing is really needed and
whether you have to also add a Referer: header or not to the request.

So if I where you:
1. Install Live HTTPS Headers extension in Firefox
2. Open it's capture window
3. Perform a logon and a sql query
4. Study the results in the capture window and try to mimic what you see
with wget.

/HH


tis 2012-01-24 klockan 10:11 +0530 skrev Bhargavi N:
 Hello Henrik,
 
 
I am saving the log on details in the cookies same way as you have
 mentioned.
 
 
 But cookie file looks empty..
 
 
 Next i loaded the cookie.txt with the
 URL: https://www.collabnet.timeinc.net/servlets/AdHocQuery
 
 
 I saved the results page and then opened in firefox, i get the same
 logon page to ollabnet instead of results page.
 
 
 Steps to run the query: 
 
 
 1) I need to logon to collabnet website:
 https://www.collabnet.timeinc.net
 
 
 2) In the collabnet website, we have an option to run the query:
 
 
 https://www.collabnet.timeinc.net/servlets/AdHocQuery
 
 
 I need to go to this page and then need to provide SQL query in the
 query text area.
 
 
 Then need to submit the button Run Query, which will submit the
 query.
 
 
 3) Next need to download the results page to my local directoy on
 LINUX box.
 
 
 Please help me regarding this.
 
 
 Thanks !
 
 
 Regards,
 Bhargavi
 
 
 On Mon, Jan 23, 2012 at 12:43 PM, Henrik Holst
 henrik.ho...@millistream.com wrote:
 Ok,
 
   most probably the first site (where you logon) returns a
 cookie which you must present to the other site (where you
 perform the sql-query). So
 
 wget --username xx --password yy https://collabnet website
 -O /dev/null --save-cookies cookies.txt
 
 That will logon and save the resulting cookie in the
 cookies.txt file. Next you have to send this cookie and your
 query to the other site:
 
 whet --load-cookies cookies.txt https://the query site
 --post-data the query -O result
 
 And the result should be in the result file.
 
 Now as Angel Gonzales wrote you probably have to send the
 query using the form like --post-data query=Select%20*%20From
 %20table where the sql query in question has to be
 url-encoded (something that wget cannot do for you) which is
 mostly replave all spaces with %20 and also all occurences of
 +, ,  and ' with their respective %hex code.
 
 A easy way to see exactly which formfield to use in the
 post-data is to install the Live Headers extension to Firefox
 and open it's capture window and peform the query on the site
 using Firefox, you'll then get to see the exact query, what
 the GET string is, what the post-data is etc. It's a very good
 start to go from.
 
 
 /HH
 
 2012/1/20 Bhargavi N bhagc...@gmail.com
 I meant i am unable to logon with to the website.
 
 
 Sorry, to confuse you with lot of questions.
 
 
 I have to logon to collabnet website first, providing
 username and passwd.
 
 
 Next i want to go to servlet page where i can provide
 SQL query in the textarea of the form.
 
 
 Then i need to submit the form and download the
 results page to my local directory in linux box.
 
 
 I will be running wget command on linux box.
 
 
 Please help me regarding this.
  

[Bug-wget] Feature request/suggestion: option to pre-allocate space for files

2012-01-24 Thread markk
Hi,

This post is to suggest a new feature for wget: an option to pre-allocate
disk space for downloaded files. (Maybe have a --pre-allocate command-line
option?)

The ability to pre-allocate space for files would be useful for a couple
of reasons:

- By pre-allocating all space before downloading, the risk of exiting due
to a disk-full error is avoided. When downloading from a server which
doesn't support resuming downloads, an accidental disk full condition
means you have to re-download the whole file after freeing up some disk
space. That wastes a lot of time and network bandwidth.

- Disk fragmentation can be reduced. Downloading large files can take many
hours. While wget is downloading, much other disk activity can be caused
by other programs (web browser cache, email client etc.). The result is
the wget output file can end up unnecessarily fragmented. And likewise,
files written by other programs while wget is running end up more
fragmented.

On Linux, fallocate() and posix_fallocate() can be used to pre-allocate
space. The advantage of fallocate() is that, by using the
FALLOC_FL_KEEP_SIZE flag, space is allocated but the apparent file size is
unchanged. That means resuming with --continue works as normal.
posix_fallocate() on the other hand, sets the file length to its full
size, meaning that --continue won't work unless there were some way to
specify the byte offset that wget should continue from.

The fallocate program (see man 1 fallocate) can be used to manually
pre-allocate space. For a single file that's a slight hassle but simple
enough. (Run wget to determine file length, break, use fallocate to
allocate space, then re-run wget.) But when using wget to download many
files in one session it's not really practical.

Of course, if the web server does not report the file size, it won't be
possible to pre-allocate space. Or would it...? Suppose the user is
downloading some CD ISO images from a server which does not report file
lengths. If the user could tell wget to pre-allocate 800MB for each file,
and then have wget call ftruncate() when each file has finished
downloading, that should achieve a result almost as good as if the server
did report file lengths.


-- Mark