Re: [Bug-wget] PATCH: Fix FTBFS on GNU/Hurd

2017-03-11 Thread Tim Rühsen
Hi Svante,

On Freitag, 10. März 2017 14:20:56 CET Svante Signell wrote:
> Hello,
> 
> wget currently does not build from source on GNU/Hurd since Debian version
> 1.18- 4. This is due to that HAVE_PTHREAD_RWLOCK_RDLOCK_PREFER_WRITER is
> not defined by configure and then assumes that the function
> pthread_rwlockattr_setkind_np() is available. On GNU/Hurd it is not. The
> Hurd libpthread is built from the sources in glibc/libpthread/*, not in
> glibc/nptl.
> 
> The attached patch fixes the build problems by conditioning on __GNU__ which
> is unique for GNU/Hurd. However, a better solution would probably be to
> detect if the function pthread_rwlockattr_setkind_np() is available in
> configure.ac or m4/*.m4 or check if TPS+SCHED_FIFO/SCHED_RR is supported by
> pthread_rwlock_rdlock(), see below.
> 
> As written in the comments of m4/pthread_rwlock_rdlock.m4 POSIX-2008 only
> requires this for specific implementations:
> 
> dnl POSIX:2008 makes this requirement only for implementations that support
> TPS dnl (Thread Priority Scheduling) and only for the scheduling policies
> SCHED_FIFO dnl and SCHED_RR, see
> dnl
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_rd
> l ock.html
> dnl but test verifies the guarantee regardless of TPS and regardless of
> dnl scheduling policy.
> 
> Thank you for your attention.

You should address the gnulib project directly. You patch gnulib files, which 
are imported/generated during the Wget build.

So please write to bug-gnu...@gnu.org. I am sure, your patches are welcome. 
And with that all projects using gnulib will benefit in the future.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


[Bug-wget] [bug #50514] Convert Links touching embedded Javascript

2017-03-11 Thread Ages Ayemtwo
URL:
  

 Summary: Convert Links touching embedded Javascript
 Project: GNU Wget
Submitted by: ages2500
Submitted on: Sat 11 Mar 2017 07:34:12 PM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any
 Release: 1.18
Operating System: GNU/Linux
 Reproducibility: Every Time
   Fixed Release: None
 Planned Release: None
  Regression: No
   Work Required: None
  Patch Included: No

___

Details:

The convert links process, and maybe recursive retrieval process are touching
embedded javascript code and mangling the code up a bit. Historically, I
believe wget stayed away from touching javascript.

I attempt to grab the following page using:


wget -rkE -l inf -P wget_test -D dimensionality.com
http://www.dimensionality.com/freebeeexamples/freebieexample1-javascript.html


The original source code includes the following line:


document.write("http://www.dimensionality.com/freebeeexamples/\"rw1";);"


Ideally this line should be left as is, untouched.

Tested on wget v1.16 custom compiled from savanna source, on GNU/Linux Debian
7, and package install of wget v1.18 on Cygwin.




___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50516] domain.com vs www.domain.com site duplication

2017-03-11 Thread Ages Ayemtwo
URL:
  

 Summary: domain.com vs www.domain.com site duplication
 Project: GNU Wget
Submitted by: ages2500
Submitted on: Sat 11 Mar 2017 08:01:57 PM UTC
Category: Feature Request
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any
 Release: None
Operating System: None
 Reproducibility: None
   Fixed Release: None
 Planned Release: None
  Regression: None
   Work Required: None
  Patch Included: No

___

Details:

When retrieving http://www.domain.com/, the site author may link a file to
domain.com, without the www. This also occurs when the opposite is true.

Either scenario results in the website being downloaded twice, creating a
hapazard mesh of file links between:

/domain.com/

and

/www.domain.com/

It also means that 404 pages will link to http://domain.com/ in the html of
files of one folder, and http://www.domain.com/ in the other.

If one were to overlook the local mess this creates, it still puts extra
strain on a large wget process by crawling and downloading near twice as much
data than it needs to.

Restricting the site to -D www.domain.com runs the risk of missing data. To
ensure I get all of the data from the domain in question, I use -D
domain.com.

It would be nice for an extra flag to treat domain.com and www.domain.com
content the same in wget, and store the content in the same folder without
content duplication.

I am not requesting that this feature be a default function, but rather an
additional flag/feature that treats www.domain.com and domain.com as coming
from the same domain.

The following URL will exhibit this behavior in wget:


wget -rkE -np -l inf -D runequake.com http://www.runequake.com/







___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/