I think the reason Google will not find it is that, in the Journal website, the R files (and the names of the article directories that might contain them, such as journal.sjdm.org/8210/ -- see below) are not directly pointed to by any index.html or any "<href> ... </href>" in the website, as far as I can see. This would be why 'wget' cannot find them in HTTP mode, and it would prevent Google being led to them.
On the other hand, if one knows the name of a directory, then a wget on that directory will assemble its list of contents into an "index.html" file on the local machine, from which the names of any ".R" files can be extracted with a bit of greppery. For example, wget http://journal.sjdm.org/8210/ creates a local file "index.html", and then grep '[.]R' index.html outputs: <tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="probs.R">probs.R</a></td><td align="right">09-Dec-2008 14:37 </td><td align="right">1.0K</td></tr> <tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="test.R">test.R</a></td><td align="right">23-May-2008 05:46 </td><td align="right">251 </td></tr> thus revealing the two R files "probs.R" and "test.R" which are there. Then a bit of seddery (or the like) could probably extract just the filenames, by looking for *.R between > and <. However, the key to the whole thing is knowing what the numerical directory names (such as "8210) are. The only way I've found to do this automatically is to download the whole site (Linux commands): mkdir sjdm cd sjdm wget -r -k -np -nH http://journal.sjdm.org/ extract the numeric directory-names with (e.g.): find . -type d -name '[0-9]*[0-9]' -print and then work through the results of this with directory-specific wget's as before. This all seems to be overkill, however! Much easier if the site would accept FTP. Ted. On 07-Jun-09 11:45:19, Gabor Grothendieck wrote: > The fact that the search did find two files suggests that > it works but the problem may be that google has just not > indexed those other files. Try entering the url for one of > them into google and google still does not find it. > http://journal.sjdm.org/8210/test.R > > > On Sun, Jun 7, 2009 at 7:37 AM, Ted > Harding<ted.hard...@manchester.ac.uk> wrote: >> On 07-Jun-09 10:56:25, Gabor Grothendieck wrote: >>> Try this: >>> site:journal.sjdm.org filetype:R >> >> When I enter that into Google, I got only the following two hits: >> >> _# >> _#!/usr/bin/Rscript --vanilla # input is a pre-made list of files ... >> _#!/usr/bin/Rscript --vanilla # input is a pre-made list of files >> ending >> _in html called ../htmlist # (see below). This is easily modified. ... >> _journal.sjdm.org/RePEc/rss/rss.R - Cached - Similar pages >> >> _# >> _#!/usr/bin/Rscript --vanilla --verbose # script to convert RePEc ... >> _#!/usr/bin/Rscript --vanilla --verbose # script to convert >> RePEc-style >> _rdf files (ReDIFF) to DOAJ-type xml files # usage: oai.R [file] # >> where >> _[file] is a ... >> _journal.sjdm.org/RePEc/rss/oai.R - Cached - Similar pages >> >> none of which is what Jonathan os looking for (and the "Similar pages" >> links are a waste of time). >> >> In "regexp language", what he is looking for is >> >> _http://journal.sjdm.org/[0:9]+/*.R >> >> of which there are several instances on the site, for example >> >> _http://journal.sjdm.org/8210/ >> >> shows >> >> _ jdm8210.html _ _13-Dec-2008 1 >> _ jdm8210.pdf _ _ 13-Dec-2008 11:18 _ _ _ 102K >> _ jdm8210.tex _ _ 13-Dec-2008 11:18 _ _ _ 27K >> _ jdm8210001.gif _09-Dec-2008 14:38 _ _ _ 11K >> _ probs.R _ _ _ _ 09-Dec-2008 14:37 _ _ _ 1.0K >> _ test.R _ _ _ _ _23-May-2008 05:46 _ _ _ 251 >> _ ttest.csv _ _ _ 22-May-2008 21:31 _ _ _ 2.6K1:18 _ _ _ _31K >> >> so there are two ".R"files there (8210 is the number of an article >> in the Journal). Other similar directories mAy or may not have >> ".R" files -- for example >> _http://journal.sjdm.org/8816/ >> has none. >> >> The problem is that utilities like wget won;t work in this case, >> since HTTP doesn't accept "wild cards", unlike FTP; but the journal >> site doesn't accept FTP ... !! >> >> It's an intriguing problem, and I'm seeking advice amongst my Linux >> acquaintances about it. I sonehow doubt that there is a solution ... >> >> Ted. >>> On Sat, Jun 6, 2009 at 6:39 PM, Jonathan Baron<ba...@psych.upenn.edu> >>> wrote: >>>> I also use R to redraw figures for the journal I edit (below), when >>>> the authors cannot produce usable graphics (about 50% of the author >>>> who try). >>>> >>>> Unfortunately, I cannot find a way to search for just the R files. >>>> They are all http://journal.sjdm.org/*/*.R >>>> where * is the number of the article. _But Google, to my knowledge >>>> will not deal with wildcards like this. >>>> >>>> Jon >>>> -- >>>> Jonathan Baron, Professor of Psychology, University of Pennsylvania >>>> Home page: http://www.sas.upenn.edu/~baron >>>> Editor: Judgment and Decision Making (http://journal.sjdm.org) >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -------------------------------------------------------------------- >> E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> >> Fax-to-email: +44 (0)870 094 0861 >> Date: 07-Jun-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 12:37:34 >> ------------------------------ XFMail ------------------------------ >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 15:01:10 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.