Re: [R] require help

2017-09-15 Thread Berend Hasselman

> On 15 Sep 2017, at 11:38, yadav neog  wrote:
> 
> hello to all. I am working on macroeconomic data series of India, which in
> a yearly basis. I am unable to convert my data frame into time series.
> kindly help me.
> also using zoo and xts packages. but they take only monthly observations.
> 
> 'data.frame': 30 obs. of  4 variables:
> $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
> $ cnsm: num  174 175 175 172 173 ...
> $ incm: num  53.4 53.7 53.5 53.2 53.3 ...
> $ wlth: num  60.3 60.5 60.2 60.1 60.7 ...
> -- 

Second try to do what you would like (I hope and think)
Using Eric's sample data


zdf <- data.frame(year=2001:2010, cnsm=sample(170:180,10,replace=TRUE),
 incm=rnorm(10,53,1), wlth=rnorm(10,60,1))
zdf

# R ts
zts <- ts(zdf[,-1], start=zdf[1,"year"])
zts

# turn data into a zoo timeseries and an xts timeseries

library(zoo)
z.zoo <- as.zoo(zts)
z.zoo

library(xts)
z.xts <- as.xts(zts)
z.xts


Berend Hasselman

> Yadawananda Neog
> Research Scholar
> Department of Economics
> Banaras Hindu University
> Mob. 9838545073
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding Principal Component Analysis result Interpretation

2017-09-15 Thread Bert Gunter
This list is about R programming, not statistics, although they do often
intersect. Nevertheless, this discussion seems to be all about the latter,
not the former, so I think you would do better bringing it to a statistics
list like stats.stackexchange.com rather than here.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 15, 2017 at 5:12 AM, Ismail SEZEN  wrote:

> First, see the example at https://isezen.github.io/PCA/
>
> > On 15 Sep 2017, at 13:43, Shylashree U.R 
> wrote:
> >
> > Dear Sir/Madam,
> >
> > I am trying to do PCA analysis with "iris" dataset and trying to
> interpret
> > the result. Dataset contains 150 obs of 5 variables
> >
> >Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
> > 1 5.13.5 1.4
> >0.2 setosa
> > 2 4.93.0 1.4
> > 0.2 setosa
> > .
> > .
> >150 5.93.0  5.1
> 18
> > verginica
> >
> > now I used 'prcomp' function on dataset and got result as following:
> >> print(pc)
> > Standard deviations (1, .., p=4):
> > [1] 1.7083611 0.9560494 0.3830886 0.1439265
> >
> > Rotation (n x k) = (4 x 4):
> >PC1 PC2PC3PC4
> > Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
> > Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
> > Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
> > Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
> >
> > I'm planning to use PCA as feature selection process and remove variables
> > which are corelated in my project, I have interpreted the PCA result, but
> > not sure is my interpretation is correct or wrong.
>
>
> You want to “remove variables which are correlated”. Correlated among
> themselves? If so, why don’t you create a pearson correlation matrix (see
> ?cor) and define a threshold and remove variables which are correlated
> according to this threshold? Perhaps I did not understand you correctly,
> excuse me.
>
> for iris dataset, each component will be as much as correlated with PC1
> and remaining part will be correlated PC2 and so on. Hence, you can
> identify which variables are similar in terms of VARIANCE. You can
> understand it if you examine the example that I gave above.
>
> In PCA, you can also calculate the correlations between variables and PCs
> but this shows you how PCs are affected by this variables. I don’t know how
> you plan to accomplish feature selection process so I hope this helps you.
> Also note that resources part at the end of example.
>
> isezen
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] require help

2017-09-15 Thread yadav neog
thanks, eric../ actually I have the data which have not specify the months.
therefore i bound to declare is in yearly data. i also attached a sample
data set that may be helpful for you to providing suggestions. thank you

On Fri, Sep 15, 2017 at 5:23 PM, Ismail SEZEN  wrote:

>
> > On 15 Sep 2017, at 12:38, yadav neog  wrote:
> >
> > hello to all. I am working on macroeconomic data series of India, which
> in
> > a yearly basis. I am unable to convert my data frame into time series.
>
>
> Do you really need to convert your data to time series/xts/zoo? I don’t
> know you try what kind of an analysis but perhaps you don’t have to.
>
> > kindly help me.
> > also using zoo and xts packages. but they take only monthly observations.
>
> If you really have to convert to xts/zoo, why don’t yo set each year to
> first day of January and use it as is? For instance,
>
> index, cnsm, incm, wlth
> 1980-01-01, 174, 53.4, 60.3
> 1981-01-01, 175, 53.7, 60.5
> 1982-01-01, 175, 53.5, 60.2
> …..
>
> >
> > 'data.frame': 30 obs. of  4 variables:
> > $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
> > $ cnsm: num  174 175 175 172 173 ...
> > $ incm: num  53.4 53.7 53.5 53.2 53.3 ...
> > $ wlth: num  60.3 60.5 60.2 60.1 60.7 ...
> > --
> > Yadawananda Neog
> > Research Scholar
> > Department of Economics
> > Banaras Hindu University
> > Mob. 9838545073
> >
>
>


-- 
Yadawananda Neog
Research Scholar
Department of Economics
Banaras Hindu University
Mob. 9838545073
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regarding Principal Component Analysis result Interpretation

2017-09-15 Thread Ismail SEZEN
First, see the example at https://isezen.github.io/PCA/

> On 15 Sep 2017, at 13:43, Shylashree U.R  wrote:
> 
> Dear Sir/Madam,
> 
> I am trying to do PCA analysis with "iris" dataset and trying to interpret
> the result. Dataset contains 150 obs of 5 variables
> 
>Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
> 1 5.13.5 1.4
>0.2 setosa
> 2 4.93.0 1.4
> 0.2 setosa
> .
> .
>150 5.93.0  5.1  18
> verginica
> 
> now I used 'prcomp' function on dataset and got result as following:
>> print(pc)
> Standard deviations (1, .., p=4):
> [1] 1.7083611 0.9560494 0.3830886 0.1439265
> 
> Rotation (n x k) = (4 x 4):
>PC1 PC2PC3PC4
> Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
> Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
> Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
> Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
> 
> I'm planning to use PCA as feature selection process and remove variables
> which are corelated in my project, I have interpreted the PCA result, but
> not sure is my interpretation is correct or wrong.


You want to “remove variables which are correlated”. Correlated among 
themselves? If so, why don’t you create a pearson correlation matrix (see ?cor) 
and define a threshold and remove variables which are correlated according to 
this threshold? Perhaps I did not understand you correctly, excuse me.

for iris dataset, each component will be as much as correlated with PC1 and 
remaining part will be correlated PC2 and so on. Hence, you can identify which 
variables are similar in terms of VARIANCE. You can understand it if you 
examine the example that I gave above.

In PCA, you can also calculate the correlations between variables and PCs but 
this shows you how PCs are affected by this variables. I don’t know how you 
plan to accomplish feature selection process so I hope this helps you. Also 
note that resources part at the end of example.

isezen
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating Weeks Since Last Event

2017-09-15 Thread jim holtman
Try this:


# supplied data
library(zoo)  # need the 'na.locf' function

x <- structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
   16461, 16468, 16475, 16482,
16489, 16496, 16503, 16510, 16517,
   16524, 16531, 16538, 16545,
16552, 16559, 16566, 16573, 16580,
   16587, 16594, 16601, 16608,
16615, 16622), class = "Date"), OnPromotion =
  c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1)), .Names = c("ScanDate",
"OnPromotion"), sorted =
 "ScanDate", class = c("data.table",
   "data.frame"), row.names = c(NA, -28L))


# find where the promotions start and then create a flag that indicates when
# the previous promotion started
indx <- which(x$OnPromotion == 1)[1]  # get initial promotion
if (length(indx) == 0) stop('no promtions')  # make sure there is one
in the data

# add a column with the running total of promotions
x$count <- c(rep(0, indx - 1), seq(0, length = nrow(x) - indx + 1))
x$flag <- x$count  # save a copy

# now replace no promotions with NAs so we can use 'na.locf'
indx <- (x$OnPromotion == 0) & (x$count != 0)
x$flag[indx] <- NA
x$flag <- zoo::na.locf(x$flag)

# determine weeks since
x$weeks_since <- ifelse(x$count != 0,
x$count - x$flag + 1,
0
)

x  # print out the result


##


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Sep 15, 2017 at 5:02 AM, Abhinaba Roy  wrote:
> Hi,
>
> I have an input data
>
>> dput (input)
>
> structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
> 16461, 16468, 16475, 16482, 16489, 16496, 16503, 16510, 16517,
> 16524, 16531, 16538, 16545, 16552, 16559, 16566, 16573, 16580,
> 16587, 16594, 16601, 16608, 16615, 16622), class = "Date"), OnPromotion =
> c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
> 0, 0, 1, 1, 1, 1)), .Names = c("ScanDate", "OnPromotion"), sorted =
> "ScanDate", class = c("data.table",
> "data.frame"), row.names = c(NA, -28L))
>
> I am looking for an output
>
>> dput(output)
>
> structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
> 16461, 16468, 16475, 16482, 16489, 16496, 16503, 16510, 16517,
> 16524, 16531, 16538, 16545, 16552, 16559, 16566, 16573, 16580,
> 16587, 16594, 16601, 16608, 16615, 16622), class = "Date"), OnPromotion =
> c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
> 0, 0, 1, 1, 1, 1), Weeks_Since_Last_Promo = c(0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 4, 1,
> 1, 1)), .Names = c("ScanDate", "OnPromotion", "Weeks_Since_Last_Promo"
> ), sorted = "ScanDate", class = c("data.table", "data.frame"), row.names =
> c(NA,
> -28L))
>
> The logic :
>
> The data is weekly.
>
> I want to calculate the number of weeks elapsed since the last promotion
> (OnPromotion : 1 indicates promotion for that week and 0 indicates no
> promotion).
>
> As, there are no promotion initially we set the value for
> 'Weeks_Since_Last_Promo' to 0 (zero). The first promo occurs on
> '2015-03-02' and 'Weeks_Since_Last_Promo' is still 0. Moving to
> '2015-03-09' there was a promotion the week before and so 1 week elapsed
> after the last promo.
>
> If we look at '2015-06-15' then there was a promo 4 weeks back in the week
> of '2015-05-18' and so 'Weeks_Since_Last_Promo' = 4.
>
> How can we do it in R?
>
> Thanks,
> Abhinaba
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] require help

2017-09-15 Thread Berend Hasselman

> On 15 Sep 2017, at 16:35, Berend Hasselman  wrote:
> 
>> 
>> On 15 Sep 2017, at 11:38, yadav neog  wrote:
>> 
>> hello to all. I am working on macroeconomic data series of India, which in
>> a yearly basis. I am unable to convert my data frame into time series.
>> kindly help me.
>> also using zoo and xts packages. but they take only monthly observations.
>> 
>> 'data.frame': 30 obs. of  4 variables:
>> $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
>> $ cnsm: num  174 175 175 172 173 ...
>> $ incm: num  53.4 53.7 53.5 53.2 53.3 ...
> 
> 
> It shouldn't be difficult.
> Example:
> 
> tsdata <- data.frame(year=c(2000,2002,2003), x=c(1,2,3),y=c(10,11,12))
> xy.ts <- as.ts(tsdata)
> 
> library(zoo)
> 
> as.zoo(xy.ts)


Ignore my suggestion.  Doesn't do what you need.

Berend

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] require help

2017-09-15 Thread Berend Hasselman

> On 15 Sep 2017, at 11:38, yadav neog  wrote:
> 
> hello to all. I am working on macroeconomic data series of India, which in
> a yearly basis. I am unable to convert my data frame into time series.
> kindly help me.
> also using zoo and xts packages. but they take only monthly observations.
> 
> 'data.frame': 30 obs. of  4 variables:
> $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
> $ cnsm: num  174 175 175 172 173 ...
> $ incm: num  53.4 53.7 53.5 53.2 53.3 ...


It shouldn't be difficult.
Example:

tsdata <- data.frame(year=c(2000,2002,2003), x=c(1,2,3),y=c(10,11,12))
xy.ts <- as.ts(tsdata)

library(zoo)

as.zoo(xy.ts)


Berend Hasselman

> $ wlth: num  60.3 60.5 60.2 60.1 60.7 ...
> -- 
> Yadawananda Neog
> Research Scholar
> Department of Economics
> Banaras Hindu University
> Mob. 9838545073
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to add make option to package compilation?

2017-09-15 Thread Martin Morgan

On 09/15/2017 08:57 AM, Michael Dewey wrote:

In line

On 15/09/2017 13:30, Martin Møller Skarbiniks Pedersen wrote:

On 15 September 2017 at 14:13, Duncan Murdoch 
wrote:


On 15/09/2017 8:11 AM, Martin Møller Skarbiniks Pedersen wrote:


Hi,

I am installing a lot of packages to a new R installation and it 
takes

a
long time.
However the machine got 4 cpus and most of the packages are 
written in

C/C++.

So is it possible to add a -j4 flag to the make command when I 
use the

install.packages() function?
That will probably speed up the package installation process 390%.



See the Ncpus argument in ?install.packages.



Thanks.

However it looks like Ncpus=4 tries to compile four R packages at the 
same

time using one cpu for each packages.


The variable MAKE is defined in ${R_HOME}/etc/Renviron, and can be 
over-written with ~/.Renviron


MAKE=make -j

There is further discussion in


https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Configuration-variables

and ?Renviron.

One could configure a source installation to always compile with make 
-j, something like ./configure MAKE="make -j"


Martin





But you said you had lots to install so would that not speed things up too?


 From the documentation:
"
Ncpus: the number of parallel processes to use for a parallel
   install of more than one source package.  Values greater than
   one are supported if the ‘make’ command specified by
   ‘Sys.getenv("MAKE", "make")’ accepts argument ‘-k -j Ncpus’
"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
http://www.avg.com






This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add make option to package compilation?

2017-09-15 Thread Michael Dewey

In line

On 15/09/2017 13:30, Martin Møller Skarbiniks Pedersen wrote:

On 15 September 2017 at 14:13, Duncan Murdoch 
wrote:


On 15/09/2017 8:11 AM, Martin Møller Skarbiniks Pedersen wrote:


Hi,

I am installing a lot of packages to a new R installation and it takes
a
long time.
However the machine got 4 cpus and most of the packages are written in
C/C++.

So is it possible to add a -j4 flag to the make command when I use the
install.packages() function?
That will probably speed up the package installation process 390%.



See the Ncpus argument in ?install.packages.



Thanks.

However it looks like Ncpus=4 tries to compile four R packages at the same
time using one cpu for each packages.



But you said you had lots to install so would that not speed things up too?


 From the documentation:
"
Ncpus: the number of parallel processes to use for a parallel
   install of more than one source package.  Values greater than
   one are supported if the ‘make’ command specified by
   ‘Sys.getenv("MAKE", "make")’ accepts argument ‘-k -j Ncpus’
"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
http://www.avg.com



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add make option to package compilation?

2017-09-15 Thread Martin Møller Skarbiniks Pedersen
On 15 September 2017 at 14:13, Duncan Murdoch 
wrote:

> On 15/09/2017 8:11 AM, Martin Møller Skarbiniks Pedersen wrote:
>
>> Hi,
>>
>>I am installing a lot of packages to a new R installation and it takes
>> a
>> long time.
>>However the machine got 4 cpus and most of the packages are written in
>> C/C++.
>>
>>So is it possible to add a -j4 flag to the make command when I use the
>> install.packages() function?
>>That will probably speed up the package installation process 390%.
>>
>
> See the Ncpus argument in ?install.packages.


Thanks.

However it looks like Ncpus=4 tries to compile four R packages at the same
time using one cpu for each packages.

From the documentation:
"
   Ncpus: the number of parallel processes to use for a parallel
  install of more than one source package.  Values greater than
  one are supported if the ‘make’ command specified by
  ‘Sys.getenv("MAKE", "make")’ accepts argument ‘-k -j Ncpus’
"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add make option to package compilation?

2017-09-15 Thread Duncan Murdoch

On 15/09/2017 8:11 AM, Martin Møller Skarbiniks Pedersen wrote:

Hi,

   I am installing a lot of packages to a new R installation and it takes a
long time.
   However the machine got 4 cpus and most of the packages are written in
C/C++.

   So is it possible to add a -j4 flag to the make command when I use the
install.packages() function?
   That will probably speed up the package installation process 390%.


See the Ncpus argument in ?install.packages.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to add make option to package compilation?

2017-09-15 Thread Martin Møller Skarbiniks Pedersen
Hi,

  I am installing a lot of packages to a new R installation and it takes a
long time.
  However the machine got 4 cpus and most of the packages are written in
C/C++.

  So is it possible to add a -j4 flag to the make command when I use the
install.packages() function?
  That will probably speed up the package installation process 390%.

Regards
Martin M. S. Pedersen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding Principal Component Analysis result Interpretation

2017-09-15 Thread Suzen, Mehmet
Usually, PCA is used for a large number of features. FactoMineR [1]
package provides a couple of examples, check for temperature example.
But you may want to consult to basic PCA material as well, I suggest a
book from Chris Bishop [2].


[1] https://cran.r-project.org/web/packages/FactoMineR/vignettes/clustering.pdf
[2] http://www.springer.com/de/book/9780387310732?referer=www.springer.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] require help

2017-09-15 Thread Ismail SEZEN

> On 15 Sep 2017, at 12:38, yadav neog  wrote:
> 
> hello to all. I am working on macroeconomic data series of India, which in
> a yearly basis. I am unable to convert my data frame into time series.


Do you really need to convert your data to time series/xts/zoo? I don’t know 
you try what kind of an analysis but perhaps you don’t have to.

> kindly help me.
> also using zoo and xts packages. but they take only monthly observations.

If you really have to convert to xts/zoo, why don’t yo set each year to first 
day of January and use it as is? For instance,

index, cnsm, incm, wlth
1980-01-01, 174, 53.4, 60.3
1981-01-01, 175, 53.7, 60.5
1982-01-01, 175, 53.5, 60.2
…..

> 
> 'data.frame': 30 obs. of  4 variables:
> $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
> $ cnsm: num  174 175 175 172 173 ...
> $ incm: num  53.4 53.7 53.5 53.2 53.3 ...
> $ wlth: num  60.3 60.5 60.2 60.1 60.7 ...
> -- 
> Yadawananda Neog
> Research Scholar
> Department of Economics
> Banaras Hindu University
> Mob. 9838545073
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] require help

2017-09-15 Thread Eric Berger
You did not provide the data frame so I will first create one and then use
it to create an xts

library(xts)
df <- data.frame( year=1980:2009, cnsm=sample(170:180,30,replace=TRUE),
  incm=rnorm(30,53,1), wlth=rnorm(30,60,1))
dates <- as.Date(paste(df$year,"-01-01",sep=""))
myXts <- xts(df,order.by=dates)



On Fri, Sep 15, 2017 at 12:38 PM, yadav neog  wrote:

> hello to all. I am working on macroeconomic data series of India, which in
> a yearly basis. I am unable to convert my data frame into time series.
> kindly help me.
> also using zoo and xts packages. but they take only monthly observations.
>
> 'data.frame': 30 obs. of  4 variables:
>  $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
>  $ cnsm: num  174 175 175 172 173 ...
>  $ incm: num  53.4 53.7 53.5 53.2 53.3 ...
>  $ wlth: num  60.3 60.5 60.2 60.1 60.7 ...
> --
> Yadawananda Neog
> Research Scholar
> Department of Economics
> Banaras Hindu University
> Mob. 9838545073
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regarding Principal Component Analysis result Interpretation

2017-09-15 Thread Shylashree U.R
Dear Sir/Madam,

I am trying to do PCA analysis with "iris" dataset and trying to interpret
the result. Dataset contains 150 obs of 5 variables

Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
 1 5.13.5 1.4
0.2 setosa
 2 4.93.0 1.4
0.2 setosa
 .
 .
150 5.93.0  5.1  18
 verginica

now I used 'prcomp' function on dataset and got result as following:
>print(pc)
Standard deviations (1, .., p=4):
[1] 1.7083611 0.9560494 0.3830886 0.1439265

Rotation (n x k) = (4 x 4):
PC1 PC2PC3PC4
Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971

I'm planning to use PCA as feature selection process and remove variables
which are corelated in my project, I have interpreted the PCA result, but
not sure is my interpretation is correct or wrong.
If you can correct me it will be of great help.
If i notice the PCs result, I found both positive and negative data.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] require help

2017-09-15 Thread yadav neog
hello to all. I am working on macroeconomic data series of India, which in
a yearly basis. I am unable to convert my data frame into time series.
kindly help me.
also using zoo and xts packages. but they take only monthly observations.

'data.frame': 30 obs. of  4 variables:
 $ year: int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
 $ cnsm: num  174 175 175 172 173 ...
 $ incm: num  53.4 53.7 53.5 53.2 53.3 ...
 $ wlth: num  60.3 60.5 60.2 60.1 60.7 ...
-- 
Yadawananda Neog
Research Scholar
Department of Economics
Banaras Hindu University
Mob. 9838545073

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating Weeks Since Last Event

2017-09-15 Thread Abhinaba Roy
Hi,

I have an input data

> dput (input)

structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
16461, 16468, 16475, 16482, 16489, 16496, 16503, 16510, 16517,
16524, 16531, 16538, 16545, 16552, 16559, 16566, 16573, 16580,
16587, 16594, 16601, 16608, 16615, 16622), class = "Date"), OnPromotion =
c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1)), .Names = c("ScanDate", "OnPromotion"), sorted =
"ScanDate", class = c("data.table",
"data.frame"), row.names = c(NA, -28L))

I am looking for an output

> dput(output)

structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
16461, 16468, 16475, 16482, 16489, 16496, 16503, 16510, 16517,
16524, 16531, 16538, 16545, 16552, 16559, 16566, 16573, 16580,
16587, 16594, 16601, 16608, 16615, 16622), class = "Date"), OnPromotion =
c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1), Weeks_Since_Last_Promo = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 4, 1,
1, 1)), .Names = c("ScanDate", "OnPromotion", "Weeks_Since_Last_Promo"
), sorted = "ScanDate", class = c("data.table", "data.frame"), row.names =
c(NA,
-28L))

The logic :

The data is weekly.

I want to calculate the number of weeks elapsed since the last promotion
(OnPromotion : 1 indicates promotion for that week and 0 indicates no
promotion).

As, there are no promotion initially we set the value for
'Weeks_Since_Last_Promo' to 0 (zero). The first promo occurs on
'2015-03-02' and 'Weeks_Since_Last_Promo' is still 0. Moving to
'2015-03-09' there was a promotion the week before and so 1 week elapsed
after the last promo.

If we look at '2015-06-15' then there was a promo 4 weeks back in the week
of '2015-05-18' and so 'Weeks_Since_Last_Promo' = 4.

How can we do it in R?

Thanks,
Abhinaba

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compounding precipitation based on whether falls within a day

2017-09-15 Thread Eric Berger
Hi Eric,
Bert's solution is very elegant. His final comment prompted me to check out
the aperm() function which I have never used.
The final step to complete his response is

prec_daily2 <- aperm(prec_daily, c(3,1,2))

Regards


On Wed, Sep 13, 2017 at 9:06 PM, Bert Gunter  wrote:

> Thanks for the reprex. Wouldn't have bothered without it.
>
> The following is I believe **almost** what you want. It seems a bit clumsy
> to me, so others may provide you something neater. But anyway...
>
> ## Convert POSIXct vector to dates
> ## There are 22 different days, not 21
> date <- as.Date(prec_idx)
>
> ## Sum results by date at each i,j of the last 2 array dimensions
> z <- lapply(unique(date),function(d)
>apply(prec[date==d,,],2:3,sum)
>)
>
> ## This gives a list with 22 3x4 matrices of sums.
> ## Convert to 3x4x22 array with
>
> prec_daily <- array(unlist(z),dim=c(3,4,22))
>
> ## This is the **almost** part. You can use the aperm() function to reshape
> the array if you like. I leave those pleasures to you.
>
> HTH.
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Wed, Sep 13, 2017 at 9:52 AM, Morway, Eric  wrote:
>
> > Using the small reproducible example below, I'm wondering how best to
> > complete the following task:
> >
> > In the small reproducible example below, the 3D array prec has indexes
> that
> > correspond to time, x, y (i.e., prec[time, x, y]).  In this case, the
> time
> > index is hours since some predefined start time.  I'd like to add up all
> > the time indexes in 'prec' based on whether or not the corresponding
> hours
> > fall within a given day.  So, at the end of the small example below,
> there
> > are two variables that I'm left with, prec_idx (an hourly sequence from
> > beg_time to end_time) whose length is equal to the first index (the time
> > index) of the 3D array of precipitation called prec.  That is, I'd like
> to
> > get a 3D array called prec*_daily* that has dimension prec*_daily*[21, 3,
> > 4],
> > where 21 is the number of days and the value in say prec*_daily*[1,x,y]
> is
> > equal to prec[1,x,y] + prec[2,x,y] + ... + prec[24,x,y]
> >
> >
> > ndays <- 21
> > base_time <- as.character('2001-12-31T23:00:00Z')
> > hrs_since_base <- 1
> >
> > # adding an extra second to the end b/c I'm paranoid about the midnight
> > time stamp not being explicit
> > beg_time <- as.POSIXct(base_time, format = "%Y-%m-%dT%H:%M:%SZ") +
> > (hrs_since_base * 60 * 60) + 1
> >
> > max_hr_since <- 24 * ndays
> > end_time <- as.POSIXct(base_time, format = "%Y-%m-%dT%H:%M:%SZ") +
> > (max_hr_since * 60 * 60) + 1
> >
> > prec_idx <- seq(beg_time, end_time, by='hour')
> >
> > prec <- array(abs(rnorm((24*ndays) * 3 * 4)) , dim=c(24*ndays, 3, 4))
> >
> > length(prec_idx)
> > # 504
> > dim(prec)
> > # 504   3   4
> >
> > # How do I aggregate prec to get daily sums of precipitation based on the
> > prec_idx array?
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New CRAN Package Announcement: asciiSetupReader

2017-09-15 Thread Jacob Kaplan
I'm pleased to announce that asciiSetupReader is now on CRAN:
https://cran.r-project.org/web/packages/asciiSetupReader/index.html

This package allows users to read ASCII files that have an SPSS or SAS
setup file (.sps or .sas). Datasets that come in these txt-sps and txt-sas
paris can now be accessible through R. The function has the option of
correcting value labels (e.g. 1 to Male, 2 to Female) and column names
(e.g. V1 to Sex). You may also select only certain columns to read in which
is helpful when dealing with very large data sets.

A vignette is available explaining how to use the package with SPSS setup
files. The process is the same as for SAS setup files.

Please let me know if you if you find any bugs or problems in the package.
https://github.com/jacobkap/asciiReader/issues

Jacob

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.