[R] Chinese characters in html source captured by download.file() are garbled code , how to convert it readable

2013-07-28 Thread Yong Wang
Dear list,
I am working with R to download numerous html source code from which the
data extracted will be further processed.
The problem is the Chinese character in the html source code are all
garbled and I can't really find a way to convert them to something readable.
This problem persists on ubuntu-10 and win-7, English environment. Not try
Operating system in Chinese yet.
I know literally nothing about encoding and a comprehensive search online
does not save me from this woe.

# the code
download.file(
https://www.google.com.hk/finance/company_news?q=SHA:601857gl=cnnum=200
,destfile=tmp.txt)
test-readLines(tmp.txt,encoding=UTF-8)

#the garbled code in tmp.txt and test is like below
#��#22269;�۪o�ѵM�a�ѥ��������q�]�


Any help is highly appreciated.

yong

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up the for loop by releasing memeory

2012-12-15 Thread Yong Wang
Dear list;

How can I speed up the run of following code (illustrative)
#
con-vector(numeric)

for (i in 1:limit)
{
if(matched data for the ith item found) {
if(i==1) {con-RowOfMatchedData } else
{con-rbind(con,matchedData)}
}
}
#

each RowOfMatchedData contains 105 variables, when i runs over 10^7
and the data container con get large enough, the codes get extremely
slow, I know this is a working memory problem (2GB only), is there
anyway to circumvent this problem without dicing and slicing the data.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mlogit package, Error in X[omitlines, ] - NA : subscript out of bounds

2011-04-29 Thread Yong Wang
I am using the mlogit packages and get a data problem, for which I
can't find any clue from R archive.

code below shows my related code all the way to the error

#---
mydata - data.frame(dependent,x,y,z)

mydata$dependent-as.factor(mydata$dependent)

mldata-mlogit.data(mydata, varying=NULL, choice=dependent, shape=wide)

summary(mlogit.1- mlogit(dependent~1|x+y+z, data = mldata, reflevel=0))

Error in X[omitlines, ] - NA : subscript out of bounds ,
#---

Could anybody kindly tip how  can I possibly solve this problem?

Thank you

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] any package for Heckman selection model when the outcome equation also probit ?

2011-04-13 Thread Yong Wang
Hi, all

Can anybody hint if there is extant package or function to deal with
Heckman selection model where the outcome model is also probit?

 In stata, it is called heckprob.


Thank you

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table() with \t as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

2011-03-16 Thread Yong Wang
hi, list

R is undoudtedly my favorite statistic tool, however, the data
inputnpart has long been a pain. most data I have to deal with are
irregular and contains special character.

Recently I get a tab delimited data, read.table(filename,sep=\t)
constantly return erors for certain rows does not has xyz elements
while all other programs such as perl,python, awk all report equal row
length if use \t as seperator.

I scout through the problematic row, sometimes it is because a row
contains a #, so I go back to specify comment.char=
next it will be some other problems, for some rows I simply can't
figure out what the problem is.

can I have any guru suggestion to save this pain now and in the
future, is CSV a safer format? or can anyone let me know what are the
fundamental principles I must bear in mind when do preliminary data
processing using other programs such as perl to ensure the output can
be readily feed into R.

best

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] file reading problem unique to windows. Err info: Error in file(file, ifelse(append, a, w)). cannot open the connection

2010-11-26 Thread Yong Wang
Thanks a lot, Prof. Ripley. the problem must be  download.file() prior
R-patched 2.12.
for each loop, I tired a couple of candidate links and only one or
none will work, this must accumulate soon to an amount of unclosed
destination files beyond the tolerance of windows.
I updated R from 2.10 to R-patched 2.12 and the problem gone.


On Thu, Nov 25, 2010 at 3:09 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 We don't have any of the information asked for in the posting guide, such as
 your version of R, reproducible example 

 But please try R-patched, since this might be

    • download.file() could leave the destination file open if the URL
      was not able to be opened.  (PR#14414)

 (If you had followed the posting guide you would have tried R-patched before
 posting )


 On Wed, 24 Nov 2010, Yong Wang wrote:

 Dear List

 I asked this question before, got some tips but can't get it solved.

 Where?  You didn't give a reference, and it would have helped the helpers.

 it is clear now that this problem only occurs when run on windows (I
 tested it on windows XP) nothing wrong at all when run on unix.
 unfortunately, sometimes I have to run it on windows,
 I appreciate any suggestion on how to circumvent this problem when run
 on windows.
 below is the problem description I submitted before.

 #

 I am running a loop downloading  web pages and save the html to a
 temporary file (use download.file() )
 then read (using readLines)  it in for processing;
 finally write useful info from each processed page to a unique file

 the problem is once the loop runs up to somewhere near  5000, it will
 throw out an err like below and won't go further.

 
 Error in file(file, ifelse(append, a, w)) :
 cannot open the connection
 -

 In the meantime, a request for new connection won't be successful, for
 example, a request for the help page of file will trigger err below

 ---
 ?file
 Error in gzfile(file, rb) : cannot open the connection
 In addition: Warning message:
 In gzfile(file, rb) :
 cannot open compressed file
 'C:/PROGRA~1/R/R-211~1.1/library/stats/help/aliases.rds', probable
 reason 'Too many open files'
 ---

 I am not sure if the problem is too many connections not closed. since
 I close the file connection after each readLines.
 checking with showConnections(all=T) does not show excessive
 connections and closeAllConnections() does not help.

 Can any one help me on this?


 Any answer highly appreciated.

 yong

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Brian D. Ripley,                  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,             Tel:  +44 1865 272861 (self)
 1 South Parks Road,                     +44 1865 272866 (PA)
 Oxford OX1 3TG, UK                Fax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] file reading problem unique to windows. Err info: Error in file(file, ifelse(append, a, w)). cannot open the connection

2010-11-24 Thread Yong Wang
Dear List

I asked this question before, got some tips but can't get it solved.
it is clear now that this problem only occurs when run on windows (I
tested it on windows XP) nothing wrong at all when run on unix.
unfortunately, sometimes I have to run it on windows,
I appreciate any suggestion on how to circumvent this problem when run
on windows.
below is the problem description I submitted before.

#

I am running a loop downloading  web pages and save the html to a
temporary file (use download.file() )
 then read (using readLines)  it in for processing;
finally write useful info from each processed page to a unique file

the problem is once the loop runs up to somewhere near  5000, it will
throw out an err like below and won't go further.


Error in file(file, ifelse(append, a, w)) :
 cannot open the connection
-

In the meantime, a request for new connection won't be successful, for
example, a request for the help page of file will trigger err below

---
 ?file
Error in gzfile(file, rb) : cannot open the connection
In addition: Warning message:
In gzfile(file, rb) :
 cannot open compressed file
'C:/PROGRA~1/R/R-211~1.1/library/stats/help/aliases.rds', probable
reason 'Too many open files'
---

I am not sure if the problem is too many connections not closed. since
I close the file connection after each readLines.
checking with showConnections(all=T) does not show excessive
connections and closeAllConnections() does not help.

Can any one help me on this?


Any answer highly appreciated.

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] what does this err mean and how to solve it? Error in file(file, ifelse(append, a, w))

2010-10-06 Thread Yong Wang
Dear List
I am running a loop downloading  web pages and save the html to a
temporary file (use download.file() )
 then read (using readLines)  it in for processing;
finally write useful info from each processed page to a unique file

the problem is once the loop runs up to somewhere near  5000, it will
throw out an err like below and won't go further.


Error in file(file, ifelse(append, a, w)) :
 cannot open the connection
-

In the meantime, a request for new connection won't be successful, for
example, a request for the help page of file will trigger err below

---
 ?file
Error in gzfile(file, rb) : cannot open the connection
In addition: Warning message:
In gzfile(file, rb) :
 cannot open compressed file
'C:/PROGRA~1/R/R-211~1.1/library/stats/help/aliases.rds', probable
reason 'Too many open files'
---

I am not sure if the problem is too many connections not closed. since
I close the file connection after each readLines.
checking with showConnections(all=T) does not show excessive
connections and closeAllConnections() does not help.

Can any one help me on this?


Any answer highly appreciated.

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to store regex expression in a variable

2010-09-25 Thread Yong Wang
dear list

I know how to store a regex expression in perl and ruby, no clue on R.
I do read R regex manual , archives, and searched on line,
still I need somebody help me out on how to store a regular expression
in a variable.

Thank you very much

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Read in a all-character file and specify field separator and records separator

2010-08-29 Thread Yong Wang
Dear list

I used to use python or awk do preliminary process and then feed into
R. In some circumstances, the data transmission becomes quite a pain.
I am wondering if there is a convenient way to read in R text file
(not data, text file in common sense) and specify field separator and
records separator, so the whole work can be reduced to one-stop
shopping.
or simply, is there one simple way to read in the text file with each
row in a single column. scan(sep=\n) does not work as expected.

Thanks

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to execute multiple R scripts sequentially in unix background

2010-05-21 Thread Yong Wang
Dear list

I need to 1) run several R scripts sequentially due to results waiting
and 2) run them in unix background since my ssh frequently timeout for
some reason.
if paste following codes to unix

R --vanilla script1 
R --vanilla script2 
R --vanilla script3 

will result in simultaneous instead of sequential execution of the
three scripts.

source() might be an alternative, however, I am not clear how to run
it in the background.

Thanks

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why eval(parse(text=var(vec))) return a matrix but NOT a number?

2010-03-17 Thread Yong Wang
Dear List

I am getting a problem when using eval(parse).
Code below sketchs what I am trying to do:

For each row of a N*K dataframe (I use a 2*2 dataframe in the example below),
applying a number of functions and get the outputs (two functions,
sum and var are used in the example below).

The problem is  eval(parse(text=sum(para))) works fine but not when
sum is replaced by var.
in the later case, a matrix instead of a number is returned.

Any suggestion highly appreciated.

Thank you

#===The function
myloop  -function(datfra,funs) {

rows-dim(datfra)[1];
totfunnum-length(funs);

for (i in 1:rows)   {
vec-datfra[i,];

for(k in 1:totfunnum)   {   
print(funs[k]);
x-eval(parse(text=funs[k]));
print(x);   
}

}
}


#Experiemental run
workport-data.frame(matrix(1:4,2,2))
funs-c(sum(vec,na.rm=T),var(vec,na.rm=T))  

myloop(workport,funs)

# Outputs of the
Experimental run

[1] sum(vec,na.rm=T)
[1] 4
[1] var(vec,na.rm=T)
   X1 X2
X1 NA NA
X2 NA NA
[1] sum(vec,na.rm=T)
[1] 6
[1] var(vec,na.rm=T)
   X1 X2
X1 NA NA
X2 NA NA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] for Interaction of continous var and categorical var, any way approach the categorical var to continous ?

2009-03-26 Thread Yong Wang
Dear list,

This is NOT a techical question ragrding use of R.

I have a linear model where the response variable is neigborhood
safety . It is projected poverty deteriorate safety and number of
officers per thousand residents improve safety. The focal hypothesis
is poverty poses less safety threat when officers number is high.

To check the focal hypothesis, the continuous variable officers is
recoded as catogorical with two levels (high and low). the results is
below and support the hyothesis

#=
model - lm(neigborhood safety ~ poverty * officers)
The coefficients (all significant):
poverty-0.05
officers 0.058
poverty : officers0.014
#==

My question is how to check the weakened poverty effect with a
minuscle increase of officers. the coeeficient for the interaction
term of continous poverty and officers is hard to interpret and is
not suitable to check the focal hypothesis since, say, (povety=3 
officers=8) will be the same as (poverty=8  officer=3).

Thanks a lot in advance for any suggestions!

Sincerely,

Will

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] UNIX Installation of package systemfit fails

2009-02-05 Thread Yong Wang
Dear list
I am trying to install the systemfit package under unix,

install.packages(systemfit)

the installation failed. I am attaching the error and version
information below,
(if dependencies=TRUE, much more error)
any help appreciated

best,
yong

=
 install.packages(systemfit)
Warning in install.packages(systemfit) :
  argument 'lib' is missing: using
'/usr/home/d/068/meta/R/x86_64-unknown-linux-gnu-library/2.7'
--- Please select a CRAN mirror for use in this session ---
CRAN mirror

 1: Argentina   2: Australia
 3: Austria 4: Belgium
 5: Brazil (PR) 6: Brazil (RJ)
 7: Brazil (SP 1)   8: Brazil (SP 2)
 9: Canada (BC)10: Canada (ON)
11: Chile  12: China
13: Croatia14: Denmark
15: France (Toulouse)  16: France (Lyon)
17: France (Paris) 18: Germany (Goettingen)
19: Germany (Muenchen) 20: Iran
21: Ireland22: Italy (Milano)
23: Italy (Padua)  24: Italy (Palermo)
25: Japan (Aizu)   26: Japan (Tokyo)
27: Japan (Tsukuba)28: Korea
29: Mexico 30: Netherlands (Amsterdam 2)
31: Netherlands (Utrecht)  32: New Zealand
33: Norway 34: Poland (Oswiecim)
35: Poland (Wroclaw)   36: Portugal
37: Russia 38: Singapore 1
39: Singapore 240: Slovenia (Ljubljana)
41: South Africa   42: Spain (Madrid)
43: Sweden 44: Switzerland
45: Taiwan (Taichung)  46: Taiwan (Taipeh)
47: Thailand   48: Turkey
49: UK (Bristol)   50: USA (CA 1)
51: USA (CA 2) 52: USA (IA)
53: USA (MI)   54: USA (MO)
55: USA (NC)   56: USA (OH)
57: USA (PA 1) 58: USA (PA 2)
59: USA (TX 1) 60: USA (TX 2)
61: USA (WA)

Selection: 57
also installing the dependencies âzooâ, âMatrixâ, âcarâ, âlmtestâ

trying URL 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/zoo_1.5-4.tar.gz'
Content type 'application/x-gzip' length 609057 bytes (594 Kb)
opened URL
==
downloaded 594 Kb

trying URL 
'http://lib.stat.cmu.edu/R/CRAN/src/contrib/Matrix_0.999375-20.tar.gz'
Content type 'application/x-gzip' length 1954872 bytes (1.9 Mb)
opened URL
==
downloaded 1.9 Mb

trying URL 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/car_1.2-12.tar.gz'
Content type 'application/x-gzip' length 220728 bytes (215 Kb)
opened URL
==
downloaded 215 Kb

trying URL 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/lmtest_0.9-22.tar.gz'
Content type 'application/x-gzip' length 191099 bytes (186 Kb)
opened URL
==
downloaded 186 Kb

trying URL 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/systemfit_1.0-8.tar.gz'
Content type 'application/x-gzip' length 727116 bytes (710 Kb)
opened URL
==
downloaded 710 Kb

ERROR: failed to lock directory
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7' for
modifying
Try removing 
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7/00LOCK'
ERROR: failed to lock directory
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7' for
modifying
Try removing 
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7/00LOCK'
ERROR: failed to lock directory
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7' for
modifying
Try removing 
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7/00LOCK'
ERROR: failed to lock directory
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7' for
modifying
Try removing 
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7/00LOCK'
ERROR: failed to lock directory
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7' for
modifying
Try removing 
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7/00LOCK'

The downloaded packages are in
/tmp/RtmpJ28hPv/downloaded_packages
Warning messages:
1: In install.packages(systemfit) :
  installation of package 'zoo' had non-zero exit status
2: In install.packages(systemfit) :
  installation of package 'Matrix' had non-zero exit status
3: In install.packages(systemfit) :
  installation of package 'car' had non-zero exit status
4: In install.packages(systemfit) :
  installation of package 'lmtest' had non-zero exit status
5: In install.packages(systemfit) :
  installation of package 'systemfit' had non-zero exit status



 version
   _
platform   x86_64-unknown-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  7.0
year   2008
month  04
day22

[R] problem of unix package installation, following code no response: install.packages(packagename, dependecise=TRUE)

2008-11-10 Thread Yong Wang
Dear list

I am trying installing a package under unix, the command as below
works in some case but not some other cases, the primary
syndrome is R will stop there  with a message: (say I am trying to
install the package SASxport)

###
Warning in install.packages(SASxport, dependencies = TRUE) :
  argument 'lib' is missing: using
'/usr/home/d/068/wangyong/R/x86_64-unknown-linux-gnu-library/2.7'
Warning: unable to access index for repository
http://lib.stat.cmu.edu/R/CRAN/src/contrib
#

My internet connection has no problem, why this happen?
or can you suggest some more options or examples to follow for package
installation under unix?


Thank you

will

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help on package or code for simutaneous equation probit(logit) model

2008-10-27 Thread Yong Wang
Dear List
I am trying to fit a simutaneous equation logit model. i.e., the
response variables of the structured equations are binomial, I am not
sure if systemfit can do this job. A google search doesn't yield too
much helpful information. Your knowledge on any other packages or
codes are appreciated.

Thanks

will

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lm error and how to sidestep an error occured in for loop to keep it going without being interrupted

2007-09-27 Thread Yong Wang
Dear Rlist

I am runing a for loop on a large dataset to do exploring
investigation. Code embedded in the loop include the lm routine.
Unfortunately, for some specification of dependent variable, the loop
will be interrupted by error as below:

Error in `contrasts-`(`*tmp*`, value = contr.treatment) :
contrasts can be applied only to factors with 2 or more levels

I suspect this might be caused by missing value which, once removed,
will left some factors has value only on one level. It turnss out this
is not true.

Answers for following two questions appreciated.
1. what might be the possible reason behind the error mesage
2. if I simply want to circumvent this error and keep the for loop
going, how should I do that.


Regards
young

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.