Re: [R] scaling loess curves

2015-09-10 Thread Bogdan Tanasa
Hi Petr,

thank you for your reply regarding the scaling of loess curves. Our
situation is the following :

we do have 2 experiments, and for each experiment, the set of data is in
the following format : "nodeA (chr, start, end) - node B (chr, start, end)
- interaction intensity (between A and B)".

We are trying to SCALE the LOESS curves ( for the graphs "distance between
node A and node B" vs "intensity") for experiment1 vs experiment2, in order
to make the experiments directly comparable.

I have attached 2 figures with the LOESS curves for experiment1 and
experiment2 to my email. Shall you have any suggestions, please let me
know. Thanks a lot,


-- bogdan

On Mon, Sep 7, 2015 at 7:34 AM, PIKAL Petr  wrote:

> Hi
>
> what about xlim or ylim?
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bogdan
> > Tanasa
> > Sent: Monday, September 07, 2015 8:00 AM
> > To: r-help
> > Subject: [R] scaling loess curves
> >
> > Dear all,
> >
> > please could you advise about a method to scale 2 plots of LOESS
> > curves.
> > More specifically, we do have 2 sets of 5C data, and the loess plots
> > reflect the relationship between INTENSITY and DISTANCE (please see the
> > R code below).
> >
> > I am looking for a method/formula to scale these 2 LOESS plots and make
> > them directly comparable.
> >
> > many thanks,
> >
> > -- bogdan
> >
> >
> >
> > -- the R code --
> >
> >
> >
> > a <- read.delim("a",header=T)
> > qplot(data=a,distance,intensity)+geom_smooth(method = "loess", size =
> > 1,
> > span=0.01)+xlab("distance")+ylab("intensity")
> >
> >
> >
> > b <- read.delim("b",header=T)
> > qplot(data=b,distance,intensity)+geom_smooth(method = "loess", size =
> > 1,
> > span=0.01)+xlab("distance")+ylab("intensity")
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> či zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout;
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> existence je adresátovi či osobě jím zastoupené známá.
>
> This e-mail and any documents attached to it may be confidential and are
> intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its
> sender. Delete the contents of this e-mail with all attachments and its
> copies from your system.
> If you are not the intended recipient of this e-mail, you are not
> authorized to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage
> caused by modifications of the e-mail or by delay with transfer of the
> email.
>
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a
> contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to
> immediately accept such offer; The sender of this e-mail (offer) excludes
> any acceptance of the offer on the part of the recipient containing any
> amendment or variation.
> - the sender insists on that the respective contract is concluded only
> upon an express mutual agreement on all its aspects.
> - the sender of this e-mail informs that he/she is not authorized to enter
> into any contracts on behalf of the company except for cases in which
> he/she is expressly authorized to do so in

Re: [R] Generate a vector of values, given a vector of keys and a table?

2015-09-10 Thread Bert Gunter
?match

as in:

> y <- lk_up[match(x,lk_up[,"key"]),"val"]
> y
 [1] "1" "1" "1" "1" "15000" "15000" "2"
 [8] "2" "2" "2"



Bert



Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Thu, Sep 10, 2015 at 10:04 PM, David Wolfskill  wrote:
> I apologize in advance: I must be overlooking something quite simple,
> but I'm failing to make progress.
>
> Suppose I have a "lookup table":
>
> Browse[2]> dput(lk_up)
> structure(c("1.1", "1.9", "1.4", "1.5", "1.15", "1", "1",
> "15000", "2", "25000"), .Dim = c(5L, 2L), .Dimnames = list(
> NULL, c("key", "val")))
> Browse[2]> lk_up
>  keyval
> [1,] "1.1"  "1"
> [2,] "1.9"  "1"
> [3,] "1.4"  "15000"
> [4,] "1.5"  "2"
> [5,] "1.15" "25000"
>
> and a vector whose elements correspond with the "key" column of the
> table:
>
> Browse[2]> dput(x)
> c("1.9", "1.9", "1.1", "1.1", "1.4", "1.4", "1.5", "1.5", "1.5",
> "1.5")
> Browse[2]> x
>  [1] "1.9" "1.9" "1.1" "1.1" "1.4" "1.4" "1.5" "1.5" "1.5" "1.5"
> Browse[2]>
>
> Is there a (relatively) simple (i.e., not explicitly looping) construct
> that will yield a vector of the same size and shape as "x", but contain
> the "value" entries from the lookup table (preserving the sequence: the
> 1st entry of the result must correspond to the 1st entry of the list of
> keys) -- in the current example:
>
> Browse[2]> dput(y)
> c("1", "1", "1", "1", "15000", "15000", "2",
> "2", "2", "2")
> Browse[2]> y
>  [1] "1" "1" "1" "1" "15000" "15000" "2" "2"
> "2" "2"
> Browse[2]>
>
> I am (unfortunately) presently limited to R-3.0.2.
>
> Thanks
>
> Peace,
> david
> --
> David H. Wolfskill  r...@catwhisker.org
> Those who would murder in the name of God or prophet are blasphemous cowards.
>
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generate a vector of values, given a vector of keys and a table?

2015-09-10 Thread David Wolfskill
I apologize in advance: I must be overlooking something quite simple,
but I'm failing to make progress.

Suppose I have a "lookup table":

Browse[2]> dput(lk_up)
structure(c("1.1", "1.9", "1.4", "1.5", "1.15", "1", "1", 
"15000", "2", "25000"), .Dim = c(5L, 2L), .Dimnames = list(
NULL, c("key", "val")))
Browse[2]> lk_up
 keyval
[1,] "1.1"  "1"
[2,] "1.9"  "1"
[3,] "1.4"  "15000"
[4,] "1.5"  "2"
[5,] "1.15" "25000"

and a vector whose elements correspond with the "key" column of the
table:

Browse[2]> dput(x)
c("1.9", "1.9", "1.1", "1.1", "1.4", "1.4", "1.5", "1.5", "1.5", 
"1.5")
Browse[2]> x
 [1] "1.9" "1.9" "1.1" "1.1" "1.4" "1.4" "1.5" "1.5" "1.5" "1.5"
Browse[2]> 

Is there a (relatively) simple (i.e., not explicitly looping) construct
that will yield a vector of the same size and shape as "x", but contain
the "value" entries from the lookup table (preserving the sequence: the
1st entry of the result must correspond to the 1st entry of the list of
keys) -- in the current example:

Browse[2]> dput(y)
c("1", "1", "1", "1", "15000", "15000", "2", 
"2", "2", "2")
Browse[2]> y
 [1] "1" "1" "1" "1" "15000" "15000" "2" "2"
"2" "2"
Browse[2]> 

I am (unfortunately) presently limited to R-3.0.2.

Thanks

Peace,
david
-- 
David H. Wolfskill  r...@catwhisker.org
Those who would murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpPdYiTjyAsf.pgp
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Help with Binning Data

2015-09-10 Thread David Winsemius

On Sep 10, 2015, at 5:25 PM, Achim Zeileis wrote:

> On Fri, 11 Sep 2015, Rolf Turner wrote:
> 
>> On 11/09/15 11:57, David Winsemius wrote:
>> 
>> 
>> 
>>> The urge to imitate other statistical package that rely on profusion
>>> of dummies should be resisted. R repression functions can handle
>>> factor variables 
>> 
>> 
>> 
>> Fortune? :-)
> 
> Nice! Should I include the "repression" typo? :-)
> 

 Er, maybe not. Or the package[s] error.

Whatever;
David.

> Best,
> Z
> 
>> cheers,
>> 
>> Rolf
>> 
>> -- 
>> Technical Editor ANZJS
>> Department of Statistics
>> University of Auckland
>> Phone: +64-9-373-7599 ext. 88276
>> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: Help with Binning Data

2015-09-10 Thread Rolf Turner

On 11/09/15 12:25, Achim Zeileis wrote:

On Fri, 11 Sep 2015, Rolf Turner wrote:


On 11/09/15 11:57, David Winsemius wrote:




The urge to imitate other statistical package that rely on profusion
of dummies should be resisted. R repression functions can handle
factor variables 




Fortune? :-)


Nice! Should I include the "repression" typo? :-)


Yes!  That's the point, from my point of view! :-)

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: Help with Binning Data

2015-09-10 Thread Achim Zeileis

On Fri, 11 Sep 2015, Rolf Turner wrote:


On 11/09/15 11:57, David Winsemius wrote:




The urge to imitate other statistical package that rely on profusion
of dummies should be resisted. R repression functions can handle
factor variables 




Fortune? :-)


Nice! Should I include the "repression" typo? :-)

Best,
Z


cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: Help with Binning Data

2015-09-10 Thread Rolf Turner

On 11/09/15 11:57, David Winsemius wrote:




The urge to imitate other statistical package that rely on profusion
of dummies should be resisted. R repression functions can handle
factor variables 




Fortune? :-)

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Binning Data

2015-09-10 Thread David Winsemius

On Sep 10, 2015, at 3:28 PM, Shouro Dasgupta wrote:

> Dear all,
> 
> I have 3-hourly temperature data from 1970-2010 for 122 cities in the US. I
> would like to bin this data by city-year-week. My idea is if the
> temperature for a particular city in a given week falls within a given
> range (-17.78 & -12.22), (-12.22 & -6.67), ... (37.78 & 43.33), then the
> corresponding bin would have a value of 1 and 0 otherwise.
> 
> The data looks like this. Basically, I need to generate a dummy variable
> for each temperature range. Any help will be greatly appreciated.

The urge to imitate other statistical package that rely on profusion of dummies 
should be resisted. R repression functions can handle factor variables and the 
`cut` function can deliver them along with appropriate use of `seq`:

  tmp2$Tcat <- cut( tmp2$avsft, breaks=seq (-17.78,  43.33, by= 5.55 ) )

> tmp2$Tcat
 [1] (-12.2,-6.68] (-17.8,-12.2] (-12.2,-6.68] (-6.68,-1.13]
 [5] (-1.13,4.42]  (4.42,9.97]   (-6.68,-1.13] (4.42,9.97]  
 [9] (9.97,15.5]   (-1.13,4.42] 
11 Levels: (-17.8,-12.2] (-12.2,-6.68] ... (37.7,43.3]


> tmp2[ , c("City", "Tcat")]
  City  Tcat
1AKRON (-12.2,-6.68]
2   ALBANY (-17.8,-12.2]
3  ALBUQUERQUE (-12.2,-6.68]
4ALLENTOWN (-6.68,-1.13]
5  ATLANTA  (-1.13,4.42]
6   AUSTIN   (4.42,9.97]
7BALTIMORE (-6.68,-1.13]
8  BATON ROUGE   (4.42,9.97]
9 BERKELEY   (9.97,15.5]
10  BIRMINGHAM  (-1.13,4.42]

Must have been a cold snap in the southeast that New Years Day.


There isn't that much neater than have a messy bunch of dummies? If you 
really need to build them then look at `?model.frame`.

-- 
David. 
> 
> tmp2<- dput(head(tmp1,10))
>> structure(list(yearday = c(1970001L, 1970001L, 1970001L, 1970001L,
>> 1970001L, 1970001L, 1970001L, 1970001L, 1970001L, 1970001L),
>>City = structure(1:10, .Label = c("AKRON", "ALBANY", "ALBUQUERQUE",
>>"ALLENTOWN", "ATLANTA", "AUSTIN", "BALTIMORE", "BATON ROUGE",
>>"BERKELEY", "BIRMINGHAM", "BOISE", "BOSTON", "BRIDGEPORT",
>>"BUFFALO", "CAMBRIDGE", "CAMDEN", "CANTON", "CHARLOTTE",
>>"CHATTANOOGA", "CHICAGO", "CINCINNATI", "CLEVELAND", "COLORADO
>> SPRINGS",
>>"COLUMBUS", "CORPUS CHRISTI", "DALLAS", "DAYTON", "DENVER",
>>"DES MOINES", "DETROIT", "DULUTH", "EL PASO", "ELIZABETH",
>>"ERIE", "EVANSVILLE", "FALL RIVER", "FLINT", "FORT WAYNE",
>>"FRESNO", "FT WORTH", "GARY", "GLENDALE", "GRAND RAPIDS",
>>"HARTFORD", "HONOLULU", "HOUSTON", "INDIANAPOLIS", "JACKSONVILLE",
>>"JERSEY CITY", "KANSAS CITY", "KANSAS ITY", "KNOXVILLE",
>>"Lansing ", "LAS VEGAS", "LEXINGTON", "LINCOLN", "LITTLE ROCK",
>>"LONG BEACH", "LOS ANGELES", "LOUISVILLE", "LOWELL", "LYNN",
>>"MADISON", "MEMPHIS", "MIAMI", "MILWAUKEE", "MINNEAPOLIS",
>>"MOBILE", "MONTGOMERY", "NASHVILLE", "NEW BEDFORD", "NEW HAVEN",
>>"NEW ORLEANS", "NEW YORK CITY", "NEWARK", "NORFOLK", "OAKLAND",
>>"OGDEN", "OKLAHOMA CITY", "OMAHA", "PASADENA", "PATERSON",
>>"PEORIA", "PHILADELPHIA", "PHOENIX", "PITTSBURG", "PORTLAND",
>>"PROVIDENCE", "PUEBLO", "READING", "RICHMOND", "ROCHESTER",
>>"ROCKFORD", "SACRAMENTO", "SALT LAKE CITY", "SAN ANTONIO",
>>"SAN CRUZ", "SAN DIEGO", "SAN FRANCISCO", "SAN JOSE", "SAVANNAH",
>>"SCHENECTADY", "SCRANTON", "SEATTLE", "SHREVEPORT", "SOMERVILLE",
>>"SOUTH BEND", "SPOKANE", "SPRINGFIELD", "ST LOUIS", "ST PAUL",
>>"ST PETERSBURG", "SYRACUSE", "TACOMA", "TAMPA", "TOLEDO",
>>"TRENTON", "TUCSON", "TULSA", "UTICA", "WASHINGTON", "WATERBURY",
>>"WICHITA", "WILMINGTON", "WORCESTER", "YONKERS", "YOUNGSTOWN"
>>), class = "factor"), cell_number = c(17379L, 17027L, 19514L,
>>17745L, 20256L, 21323L, 18104L, 21329L, 18779L, 20254L),
>>longitude = c(-81.519005, -73.756232, -106.609991, -75.490183,
>>-84.387982, -97.743061, -76.612189, -91.14032, -121.635963,
>>-86.80249), latitude = c(41.081445, 42.652579, 35.110703,
>>40.608431, 33.748995, 30.267153, 39.290385, 30.458283, 37.871744,
>>33.520661), State = structure(c(29L, 28L, 27L, 32L, 10L,
>>35L, 19L, 17L, 4L, 1L), .Label = c(" ALA", " ARIZ", " ARK",
>>" CAL", " COLO", " CONN", " DC", " DEL", " FLA", " GA", " HAWAII",
>>" ILL", " IND", " IOWA", " KANS", " KY", " LA", " MASS",
>>" MD", " MICH", " MINN", " MO", " NC", " NEBR", " NEV", " NJ",
>>" NM", " NY", " OHIO", " OKLA", " ORE", " PA", " RI", " TENN",
>>" TEX", " UTAH", " VA", " WASH", " WIS", "CAL", "CONN", "IDAH",
>>"KY", "MASS"), class = "factor"), avsft = c(-7.81, -16.06,
>>-7.719997, -1.88, 2.93, 5.12,
>>-5.029997, 9.330004, 15.08, 2.890004
>>), year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
>>1970L, 1970L, 1970L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>1L, 1L, 1L), hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>0L), yearweek = c(197001L, 197001L, 197001L, 197001L, 197001L,
>>197001L, 197001L, 197001L, 197001L, 197001L), week = c(1L

Re: [R] Help with Binning Data

2015-09-10 Thread Bert Gunter
1. Posting in HTML largely negated your ability to provide data
through dput(). Folow he posting guide and post in PLAIN TEXT only,
please.

2. See ?cut  . I think this will at least get you started.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Thu, Sep 10, 2015 at 3:28 PM, Shouro Dasgupta  wrote:
> Dear all,
>
> I have 3-hourly temperature data from 1970-2010 for 122 cities in the US. I
> would like to bin this data by city-year-week. My idea is if the
> temperature for a particular city in a given week falls within a given
> range (-17.78 & -12.22), (-12.22 & -6.67), ... (37.78 & 43.33), then the
> corresponding bin would have a value of 1 and 0 otherwise.
>
> The data looks like this. Basically, I need to generate a dummy variable
> for each temperature range. Any help will be greatly appreciated.
>
> tmp2<- dput(head(tmp1,10))
>> structure(list(yearday = c(1970001L, 1970001L, 1970001L, 1970001L,
>> 1970001L, 1970001L, 1970001L, 1970001L, 1970001L, 1970001L),
>> City = structure(1:10, .Label = c("AKRON", "ALBANY", "ALBUQUERQUE",
>> "ALLENTOWN", "ATLANTA", "AUSTIN", "BALTIMORE", "BATON ROUGE",
>> "BERKELEY", "BIRMINGHAM", "BOISE", "BOSTON", "BRIDGEPORT",
>> "BUFFALO", "CAMBRIDGE", "CAMDEN", "CANTON", "CHARLOTTE",
>> "CHATTANOOGA", "CHICAGO", "CINCINNATI", "CLEVELAND", "COLORADO
>> SPRINGS",
>> "COLUMBUS", "CORPUS CHRISTI", "DALLAS", "DAYTON", "DENVER",
>> "DES MOINES", "DETROIT", "DULUTH", "EL PASO", "ELIZABETH",
>> "ERIE", "EVANSVILLE", "FALL RIVER", "FLINT", "FORT WAYNE",
>> "FRESNO", "FT WORTH", "GARY", "GLENDALE", "GRAND RAPIDS",
>> "HARTFORD", "HONOLULU", "HOUSTON", "INDIANAPOLIS", "JACKSONVILLE",
>> "JERSEY CITY", "KANSAS CITY", "KANSAS ITY", "KNOXVILLE",
>> "Lansing ", "LAS VEGAS", "LEXINGTON", "LINCOLN", "LITTLE ROCK",
>> "LONG BEACH", "LOS ANGELES", "LOUISVILLE", "LOWELL", "LYNN",
>> "MADISON", "MEMPHIS", "MIAMI", "MILWAUKEE", "MINNEAPOLIS",
>> "MOBILE", "MONTGOMERY", "NASHVILLE", "NEW BEDFORD", "NEW HAVEN",
>> "NEW ORLEANS", "NEW YORK CITY", "NEWARK", "NORFOLK", "OAKLAND",
>> "OGDEN", "OKLAHOMA CITY", "OMAHA", "PASADENA", "PATERSON",
>> "PEORIA", "PHILADELPHIA", "PHOENIX", "PITTSBURG", "PORTLAND",
>> "PROVIDENCE", "PUEBLO", "READING", "RICHMOND", "ROCHESTER",
>> "ROCKFORD", "SACRAMENTO", "SALT LAKE CITY", "SAN ANTONIO",
>> "SAN CRUZ", "SAN DIEGO", "SAN FRANCISCO", "SAN JOSE", "SAVANNAH",
>> "SCHENECTADY", "SCRANTON", "SEATTLE", "SHREVEPORT", "SOMERVILLE",
>> "SOUTH BEND", "SPOKANE", "SPRINGFIELD", "ST LOUIS", "ST PAUL",
>> "ST PETERSBURG", "SYRACUSE", "TACOMA", "TAMPA", "TOLEDO",
>> "TRENTON", "TUCSON", "TULSA", "UTICA", "WASHINGTON", "WATERBURY",
>> "WICHITA", "WILMINGTON", "WORCESTER", "YONKERS", "YOUNGSTOWN"
>> ), class = "factor"), cell_number = c(17379L, 17027L, 19514L,
>> 17745L, 20256L, 21323L, 18104L, 21329L, 18779L, 20254L),
>> longitude = c(-81.519005, -73.756232, -106.609991, -75.490183,
>> -84.387982, -97.743061, -76.612189, -91.14032, -121.635963,
>> -86.80249), latitude = c(41.081445, 42.652579, 35.110703,
>> 40.608431, 33.748995, 30.267153, 39.290385, 30.458283, 37.871744,
>> 33.520661), State = structure(c(29L, 28L, 27L, 32L, 10L,
>> 35L, 19L, 17L, 4L, 1L), .Label = c(" ALA", " ARIZ", " ARK",
>> " CAL", " COLO", " CONN", " DC", " DEL", " FLA", " GA", " HAWAII",
>> " ILL", " IND", " IOWA", " KANS", " KY", " LA", " MASS",
>> " MD", " MICH", " MINN", " MO", " NC", " NEBR", " NEV", " NJ",
>> " NM", " NY", " OHIO", " OKLA", " ORE", " PA", " RI", " TENN",
>> " TEX", " UTAH", " VA", " WASH", " WIS", "CAL", "CONN", "IDAH",
>> "KY", "MASS"), class = "factor"), avsft = c(-7.81, -16.06,
>> -7.719997, -1.88, 2.93, 5.12,
>> -5.029997, 9.330004, 15.08, 2.890004
>> ), year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
>> 1970L, 1970L, 1970L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L), hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L), yearweek = c(197001L, 197001L, 197001L, 197001L, 197001L,
>> 197001L, 197001L, 197001L, 197001L, 197001L), week = c(1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("yearday",
>> "City", "cell_number", "longitude", "latitude", "State", "avsft",
>> "year", "day", "hour", "yearweek", "week"), row.names = c(NA,
>> 10L), class = "data.frame")
>
>
> Sincerely,
>
> Shouro
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mai

[R] Help with Binning Data

2015-09-10 Thread Shouro Dasgupta
Dear all,

I have 3-hourly temperature data from 1970-2010 for 122 cities in the US. I
would like to bin this data by city-year-week. My idea is if the
temperature for a particular city in a given week falls within a given
range (-17.78 & -12.22), (-12.22 & -6.67), ... (37.78 & 43.33), then the
corresponding bin would have a value of 1 and 0 otherwise.

The data looks like this. Basically, I need to generate a dummy variable
for each temperature range. Any help will be greatly appreciated.

tmp2<- dput(head(tmp1,10))
> structure(list(yearday = c(1970001L, 1970001L, 1970001L, 1970001L,
> 1970001L, 1970001L, 1970001L, 1970001L, 1970001L, 1970001L),
> City = structure(1:10, .Label = c("AKRON", "ALBANY", "ALBUQUERQUE",
> "ALLENTOWN", "ATLANTA", "AUSTIN", "BALTIMORE", "BATON ROUGE",
> "BERKELEY", "BIRMINGHAM", "BOISE", "BOSTON", "BRIDGEPORT",
> "BUFFALO", "CAMBRIDGE", "CAMDEN", "CANTON", "CHARLOTTE",
> "CHATTANOOGA", "CHICAGO", "CINCINNATI", "CLEVELAND", "COLORADO
> SPRINGS",
> "COLUMBUS", "CORPUS CHRISTI", "DALLAS", "DAYTON", "DENVER",
> "DES MOINES", "DETROIT", "DULUTH", "EL PASO", "ELIZABETH",
> "ERIE", "EVANSVILLE", "FALL RIVER", "FLINT", "FORT WAYNE",
> "FRESNO", "FT WORTH", "GARY", "GLENDALE", "GRAND RAPIDS",
> "HARTFORD", "HONOLULU", "HOUSTON", "INDIANAPOLIS", "JACKSONVILLE",
> "JERSEY CITY", "KANSAS CITY", "KANSAS ITY", "KNOXVILLE",
> "Lansing ", "LAS VEGAS", "LEXINGTON", "LINCOLN", "LITTLE ROCK",
> "LONG BEACH", "LOS ANGELES", "LOUISVILLE", "LOWELL", "LYNN",
> "MADISON", "MEMPHIS", "MIAMI", "MILWAUKEE", "MINNEAPOLIS",
> "MOBILE", "MONTGOMERY", "NASHVILLE", "NEW BEDFORD", "NEW HAVEN",
> "NEW ORLEANS", "NEW YORK CITY", "NEWARK", "NORFOLK", "OAKLAND",
> "OGDEN", "OKLAHOMA CITY", "OMAHA", "PASADENA", "PATERSON",
> "PEORIA", "PHILADELPHIA", "PHOENIX", "PITTSBURG", "PORTLAND",
> "PROVIDENCE", "PUEBLO", "READING", "RICHMOND", "ROCHESTER",
> "ROCKFORD", "SACRAMENTO", "SALT LAKE CITY", "SAN ANTONIO",
> "SAN CRUZ", "SAN DIEGO", "SAN FRANCISCO", "SAN JOSE", "SAVANNAH",
> "SCHENECTADY", "SCRANTON", "SEATTLE", "SHREVEPORT", "SOMERVILLE",
> "SOUTH BEND", "SPOKANE", "SPRINGFIELD", "ST LOUIS", "ST PAUL",
> "ST PETERSBURG", "SYRACUSE", "TACOMA", "TAMPA", "TOLEDO",
> "TRENTON", "TUCSON", "TULSA", "UTICA", "WASHINGTON", "WATERBURY",
> "WICHITA", "WILMINGTON", "WORCESTER", "YONKERS", "YOUNGSTOWN"
> ), class = "factor"), cell_number = c(17379L, 17027L, 19514L,
> 17745L, 20256L, 21323L, 18104L, 21329L, 18779L, 20254L),
> longitude = c(-81.519005, -73.756232, -106.609991, -75.490183,
> -84.387982, -97.743061, -76.612189, -91.14032, -121.635963,
> -86.80249), latitude = c(41.081445, 42.652579, 35.110703,
> 40.608431, 33.748995, 30.267153, 39.290385, 30.458283, 37.871744,
> 33.520661), State = structure(c(29L, 28L, 27L, 32L, 10L,
> 35L, 19L, 17L, 4L, 1L), .Label = c(" ALA", " ARIZ", " ARK",
> " CAL", " COLO", " CONN", " DC", " DEL", " FLA", " GA", " HAWAII",
> " ILL", " IND", " IOWA", " KANS", " KY", " LA", " MASS",
> " MD", " MICH", " MINN", " MO", " NC", " NEBR", " NEV", " NJ",
> " NM", " NY", " OHIO", " OKLA", " ORE", " PA", " RI", " TENN",
> " TEX", " UTAH", " VA", " WASH", " WIS", "CAL", "CONN", "IDAH",
> "KY", "MASS"), class = "factor"), avsft = c(-7.81, -16.06,
> -7.719997, -1.88, 2.93, 5.12,
> -5.029997, 9.330004, 15.08, 2.890004
> ), year = c(1970L, 1970L, 1970L, 1970L, 1970L, 1970L, 1970L,
> 1970L, 1970L, 1970L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L), hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L), yearweek = c(197001L, 197001L, 197001L, 197001L, 197001L,
> 197001L, 197001L, 197001L, 197001L, 197001L), week = c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("yearday",
> "City", "cell_number", "longitude", "latitude", "State", "avsft",
> "year", "day", "hour", "yearweek", "week"), row.names = c(NA,
> 10L), class = "data.frame")


Sincerely,

Shouro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with svychisq

2015-09-10 Thread Anthony Damico
could you try this, and then not use factor(age) elsewhere?

sv1 <- update( sv1 , age = factor( age ) )

if that doesn't work, is it possible for you to share a reproducible
example? thanks


On Thu, Sep 10, 2015 at 4:51 PM, Emanuele Mazzola  wrote:

> Hello,
>
> I’m having a weird issue with the function “svychisq” in package “survey",
> which would be very helpful for me in this case.
>
> I’m tabulating age categories (a factor variable subdivided into 4
> categories: [18,25), [25, 45), [45,65), [65, 85) ) with respect to
> ethnicity/race (another factor variable subdivided into “hispanic white”,
> “non hispanic black”, “hispanic black”).
>
> I’m perfectly able to get to the “svytable" object, which looks like this
>
> > svytable(~age+ETN, design=sv1)
>  ETN
> age   hisp black hispanic white non hisp black
>   [18,25)   26.97019  798.87444  183.61834
>   [25,45)  145.19650 4783.47678  854.82748
>   [45,65)  104.83682 2537.15021  595.04924
>   [65,85]0.00.00.0
>
>  Since it has last row equal to 0 (which would give me troubles with the
> corresponding chi-square p-value), I try to get rid of it by using
>
> > svytable(~factor(age)+ETN, design=sv1)
>ETN
> factor(age) hisp black hispanic white non hisp black
> [18,25)   26.97019  798.87444  183.61834
> [25,45)  145.19650 4783.47678  854.82748
> [45,65)  104.83682 2537.15021  595.04924
>
> which exactly responds to what I’m looking for and to what I’m expecting.
>
> The design is built by using
>
> sv1 = svydesign(ids=~factor(age)+ETN, weights=~WTFA.n, data=totfor)
>
> Now, if I would like to evaluate the corresponding weighted chi squared
> test, I use
>
> svychisq(~factor(age)+ETN, design=sv1)
>
> but here’s what I get from R:
>
> > svychisq(~factor(age)+ETN, design=sv1)
> Error in `[.data.frame`(design$variables, , as.character(rows)) :
>   undefined columns selected
>
> Maybe it is a stupid question but I really can’t figure out where the
> error is.
>
> Could you please help me with this?
> Thanks in advance for any information you will provide me with!
>
> Emanuele
>
> ***
> Emanuele Mazzola, Ph.D.
> Department of Biostatistics & Computational Biology
> Dana-Farber Cancer Institute
> 450 Brookline Ave
> Mail Location: LC1056
> Office Location: Longwood Center, Room 1056
> Boston, MA 02215
> Office phone 617-582-7614
> Fax 617-632-2516
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help with svychisq

2015-09-10 Thread Emanuele Mazzola
Hello,

I’m having a weird issue with the function “svychisq” in package “survey", 
which would be very helpful for me in this case.

I’m tabulating age categories (a factor variable subdivided into 4 categories: 
[18,25), [25, 45), [45,65), [65, 85) ) with respect to ethnicity/race (another 
factor variable subdivided into “hispanic white”, “non hispanic black”, 
“hispanic black”).

I’m perfectly able to get to the “svytable" object, which looks like this

> svytable(~age+ETN, design=sv1)
 ETN
age   hisp black hispanic white non hisp black
  [18,25)   26.97019  798.87444  183.61834
  [25,45)  145.19650 4783.47678  854.82748
  [45,65)  104.83682 2537.15021  595.04924
  [65,85]0.00.00.0

 Since it has last row equal to 0 (which would give me troubles with the 
corresponding chi-square p-value), I try to get rid of it by using

> svytable(~factor(age)+ETN, design=sv1)
   ETN
factor(age) hisp black hispanic white non hisp black
[18,25)   26.97019  798.87444  183.61834
[25,45)  145.19650 4783.47678  854.82748
[45,65)  104.83682 2537.15021  595.04924

which exactly responds to what I’m looking for and to what I’m expecting.

The design is built by using

sv1 = svydesign(ids=~factor(age)+ETN, weights=~WTFA.n, data=totfor)

Now, if I would like to evaluate the corresponding weighted chi squared test, I 
use

svychisq(~factor(age)+ETN, design=sv1)

but here’s what I get from R: 

> svychisq(~factor(age)+ETN, design=sv1)
Error in `[.data.frame`(design$variables, , as.character(rows)) : 
  undefined columns selected

Maybe it is a stupid question but I really can’t figure out where the error is.

Could you please help me with this?
Thanks in advance for any information you will provide me with!

Emanuele

***
Emanuele Mazzola, Ph.D.
Department of Biostatistics & Computational Biology
Dana-Farber Cancer Institute
450 Brookline Ave 
Mail Location: LC1056 
Office Location: Longwood Center, Room 1056
Boston, MA 02215
Office phone 617-582-7614
Fax 617-632-2516


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 will not install after system upgrade

2015-09-10 Thread Paul Bivand
This is most likely the "stringi" dependency, which is new.

Follow the links from the CRAN page for "stringi" and you may find some
guidance.

I initially had the same problem with my Mageia install, but it's sorted
now.

On 3 September 2015 at 22:53, Jeff Trefftzs  wrote:

> On Thu, 2015-09-03 at 16:47 -0400, Ista Zahn wrote:
> > Hi Jeff,
> > Your chances of getting a useful response will increase if you
> > provide
> > some additional information. For example, which version of R? Which
> > version of ggplot2?
>
>
> Sorry.  R was version 3.2.1
> ggplot2 1.0.1
>
> > What sequence of commands produces the error?
>
> install.packages("ggplot2") (or various equivalents while in R-Studio
>
> > What
> > _exactly_ does the error message say?
>
> I was working on my laptop, where I didn't have email enabled, so I was
> unable to cut & paste all the output.  The last bit of the error
> messages boiled down to "unable to find libicui18n.so.50.  No such file
> or directory"
>
> Does
> > update.packages(ask=FALSE, checkBuilt=TRUE)
> > install.packages("ggplot2")
>
> I hadn't tried that.
>
> Follow-up:  On the laptop I downgraded to R-3.1.3 and things worked
> again.  The various error messages I got were confusing.  When I tried
> to install ggplot2 from the first US mirror the https server at
> Berkeley), it told me "gplot2 not available for R 3.2.1".  When I tried
> one of the other servers (e.g., other Berkeley server, or the UCLA
> server) it would download, and come to grief with the libicu message.
>
> But downgrading to R 3.1.3 seems to have cured things.
>
> I'm still baffled, however, since I'm writing this on my desktop
> computer which has R version 3.2.1 and a successful install of ggplot2
> -1.0.1 actually working.
>
> --
> Jeff Trefftzs
> http://www.trefftzs.org
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply

2015-09-10 Thread Bert Gunter
...
Alternatively, you can avoid the looping (i.e. sapply) altogether by:

do.call(rbind,strsplit(x[[1]],":"))[,-3]


 [,1] [,2]
[1,] "1"  "29439275"
[2,] "5"  "85928892"
[3,] "10" "128341232"
[4,] "1"  "106024283"
[5,] "3"  "62707519"
[6,] "2"  "80464120"

These can then be added to the existing frame, converted to numeric, etc.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Thu, Sep 10, 2015 at 11:05 AM, jim holtman  wrote:
> try this:
>
>
>> x <- read.table(text = "A  B
> +  1:29439275 0.46773514
> +  5:85928892 0.81283052
> +  10:128341232 0.09332543
> +  1:106024283:ID 0.36307805
> +  3:62707519 0.42657952
> +  2:80464120 0.89125094", header = TRUE, as.is = TRUE)
>>
>> temp <- strsplit(x$A, ":")
>> x$C <- sapply(temp, '[[', 1)
>> x$D <- sapply(temp, '[[', 2)
>>
>> x
>A  B  C D
> 1 1:29439275 0.46773514  1  29439275
> 2 5:85928892 0.81283052  5  85928892
> 3   10:128341232 0.09332543 10 128341232
> 4 1:106024283:ID 0.36307805  1 106024283
> 5 3:62707519 0.42657952  3  62707519
> 6 2:80464120 0.89125094  2  80464120
>
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Thu, Sep 10, 2015 at 1:46 PM, aldi  wrote:
>
>> Hi,
>> I have a data.frame x1, of which a variable A needs to be split by
>> element 1 and element 2 where separator is ":". Sometimes could be three
>> elements in A, but I do not need the third element.
>>
>> Since R does not have a SCAN function as in SAS, C=scan(A,1,":");
>> D=scan(A,2,":");
>> I am using a combination of strsplit and sapply. If I do not use the
>> index [i] then R captures the full vector . Instead I need row by row
>> capturing the first and the second element and from them create two new
>> variables C and D.
>> Right now as is somehow in the loop i C is captured correctly, but D is
>> missing because the variables AA does not have it. Any suggestions?
>> Thank you in advance, Aldi
>>
>> A  B
>> 1:29439275 0.46773514
>> 5:85928892 0.81283052
>> 10:128341232 0.09332543
>> 1:106024283:ID 0.36307805
>> 3:62707519 0.42657952
>> 2:80464120 0.89125094
>>
>> x1<-read.table(file='./test.txt',head=T,sep='\t')
>> x1$A <- as.character(x1$A)
>>
>> for(i in 1:length(x1$A)){
>>
>> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':')))
>>
>> x1$C[i] <- sapply(x1$AA[i],function(x)x[1])
>> x1$D[i] <- sapply(x1$AA[i],function(x)x[2])
>> }
>>
>> x1
>>
>>
>>
>>  > x1
>> A  B AA  C  D
>> 1 1:29439275 0.46773514  1  1 NA
>> 2 5:85928892 0.81283052  5  5 NA
>> 3   10:128341232 0.09332543 10 10 NA
>> 4 1:106024283:ID 0.36307805  1  1 NA
>> 5 3:62707519 0.42657952  3  3 NA
>> 6 2:80464120 0.89125094  2  2 NA
>>
>>
>> --
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] r function idea: minimize() to turn reproducible_example.R into minimal_reproducible_example.R

2015-09-10 Thread Anthony Damico
just going to throw this idea out there in case it's something that anyone
wants to pursue: if i have an R script and i'm hitting some unexpected
behavior, there should be some way to remove extraneous objects and
manipulations that never touch the line that i'm trying to reproduce.
automatically stepping through the code and removing things that never
affect the final line seems (difficult but) possible.  so if my_example.R
looks like this code and i didn't understand why i was hitting an error at
the third line..

x <- mtcars
y <- mean( mtcars$mpg )
mean( x[ , "hello" ] )

..the function i am envisioning would automatically remove the `y <- mean(
mtcars$mpg )` because that object and all subsequent objects do not affect
the error resulting from the third line.  in other words, pointing this
minimize() function to the error..

minimize( "my_example.R" , 'Error in `[.data.frame`(x, , "hello") :
undefined columns selected' )

..would find that the error happens on the third line, then follow things
backward and remove any command that does not touch the line that results
in the error.  so my reproducible example was three lines, but the minimal
reproducible example became two lines.

i understand it might be impossible to automate all of the minimizing, but
i think there might be enough low-hanging fruit here that this might be a
quick and useful debugging tool for those of us trying to create
easier-to-reproduce code for members of this list.

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply

2015-09-10 Thread jim holtman
try this:


> x <- read.table(text = "A  B
+  1:29439275 0.46773514
+  5:85928892 0.81283052
+  10:128341232 0.09332543
+  1:106024283:ID 0.36307805
+  3:62707519 0.42657952
+  2:80464120 0.89125094", header = TRUE, as.is = TRUE)
>
> temp <- strsplit(x$A, ":")
> x$C <- sapply(temp, '[[', 1)
> x$D <- sapply(temp, '[[', 2)
>
> x
   A  B  C D
1 1:29439275 0.46773514  1  29439275
2 5:85928892 0.81283052  5  85928892
3   10:128341232 0.09332543 10 128341232
4 1:106024283:ID 0.36307805  1 106024283
5 3:62707519 0.42657952  3  62707519
6 2:80464120 0.89125094  2  80464120




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Sep 10, 2015 at 1:46 PM, aldi  wrote:

> Hi,
> I have a data.frame x1, of which a variable A needs to be split by
> element 1 and element 2 where separator is ":". Sometimes could be three
> elements in A, but I do not need the third element.
>
> Since R does not have a SCAN function as in SAS, C=scan(A,1,":");
> D=scan(A,2,":");
> I am using a combination of strsplit and sapply. If I do not use the
> index [i] then R captures the full vector . Instead I need row by row
> capturing the first and the second element and from them create two new
> variables C and D.
> Right now as is somehow in the loop i C is captured correctly, but D is
> missing because the variables AA does not have it. Any suggestions?
> Thank you in advance, Aldi
>
> A  B
> 1:29439275 0.46773514
> 5:85928892 0.81283052
> 10:128341232 0.09332543
> 1:106024283:ID 0.36307805
> 3:62707519 0.42657952
> 2:80464120 0.89125094
>
> x1<-read.table(file='./test.txt',head=T,sep='\t')
> x1$A <- as.character(x1$A)
>
> for(i in 1:length(x1$A)){
>
> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':')))
>
> x1$C[i] <- sapply(x1$AA[i],function(x)x[1])
> x1$D[i] <- sapply(x1$AA[i],function(x)x[2])
> }
>
> x1
>
>
>
>  > x1
> A  B AA  C  D
> 1 1:29439275 0.46773514  1  1 NA
> 2 5:85928892 0.81283052  5  5 NA
> 3   10:128341232 0.09332543 10 10 NA
> 4 1:106024283:ID 0.36307805  1  1 NA
> 5 3:62707519 0.42657952  3  3 NA
> 6 2:80464120 0.89125094  2  2 NA
>
>
> --
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply

2015-09-10 Thread aldi
Hi,
I have a data.frame x1, of which a variable A needs to be split by 
element 1 and element 2 where separator is ":". Sometimes could be three 
elements in A, but I do not need the third element.

Since R does not have a SCAN function as in SAS, C=scan(A,1,":"); 
D=scan(A,2,":");
I am using a combination of strsplit and sapply. If I do not use the 
index [i] then R captures the full vector . Instead I need row by row 
capturing the first and the second element and from them create two new 
variables C and D.
Right now as is somehow in the loop i C is captured correctly, but D is 
missing because the variables AA does not have it. Any suggestions? 
Thank you in advance, Aldi

A  B
1:29439275 0.46773514
5:85928892 0.81283052
10:128341232 0.09332543
1:106024283:ID 0.36307805
3:62707519 0.42657952
2:80464120 0.89125094

x1<-read.table(file='./test.txt',head=T,sep='\t')
x1$A <- as.character(x1$A)

for(i in 1:length(x1$A)){

x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':')))

x1$C[i] <- sapply(x1$AA[i],function(x)x[1])
x1$D[i] <- sapply(x1$AA[i],function(x)x[2])
}

x1



 > x1
A  B AA  C  D
1 1:29439275 0.46773514  1  1 NA
2 5:85928892 0.81283052  5  5 NA
3   10:128341232 0.09332543 10 10 NA
4 1:106024283:ID 0.36307805  1  1 NA
5 3:62707519 0.42657952  3  3 NA
6 2:80464120 0.89125094  2  2 NA


-- 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] kendall tau distance

2015-09-10 Thread Ragia Ibrahim
Dear group
how to calculate  kendall tau distance according to  Kendall_tau_distance at 
wikipedia

 https://en.wikipedia.org/wiki/Kendall_tau_distance
 


thanks in advance
Ragia 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Frank Schwidom


df <- data.frame( V1= 1, V2= c( 2, 3, 2, 1), V3= c( 1, 2, 1, 1))
dfO <- df[ do.call( order, df), ]
dfOD <- duplicated( dfO)
dfODTrigger <- ! c( dfOD[-1], FALSE)
dfOCounts <- diff( c( 0, which( dfODTrigger)))
cbind( dfO[ dfODTrigger, ], dfOCounts)

  V1 V2 V3 dfOCounts
4  1  1  1 1
3  1  2  1 2
2  1  3  2 1

Regards


On Thu, Sep 10, 2015 at 01:11:24PM +, Thomas Chesney wrote:
> Can anyone suggest a way of counting how frequently sets of values occurs in 
> a data frame? Like table() only with sets.
> 
> So for a dataset:
> 
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
> 
> The output would be something like:
> 
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
> 
> Thank you,
> 
> Thomas Chesney
> 
> 
> 
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it. 
> 
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
> 
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Thierry Onkelinx
Have a look at the dplyr package

library(dplyr)
n <- 1000
data_frame(
  V1 = sample(0:1, n, replace = TRUE),
  V2 = sample(0:1, n, replace = TRUE),
  V3 = sample(0:1, n, replace = TRUE)
) %>%
  group_by(V1, V2, V3) %>%
  mutate(
Freq = n()
  )


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-09-10 15:11 GMT+02:00 Thomas Chesney :

> Can anyone suggest a way of counting how frequently sets of values occurs
> in a data frame? Like table() only with sets.
>
> So for a dataset:
>
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
>
> The output would be something like:
>
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
>
> Thank you,
>
> Thomas Chesney
>
>
>
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it.
>
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
>
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Fox, John
Dear Thomas,

How about this?

> table(apply(Data, 1, paste, collapse=","))

1,1,1 1,2,1 1,3,2 
1 2 1

I hope this helps,
 John

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas
> Chesney
> Sent: September 10, 2015 9:11 AM
> To: r-help@r-project.org
> Subject: [R] Counting occurrences of a set of values
> 
> Can anyone suggest a way of counting how frequently sets of values occurs in a
> data frame? Like table() only with sets.
> 
> So for a dataset:
> 
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
> 
> The output would be something like:
> 
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
> 
> Thank you,
> 
> Thomas Chesney
> 
> 
> 
> This message and any attachment are intended solely for the addressee and may
> contain confidential information. If you have received this message in error,
> please send it back to me, and immediately delete it.
> 
> Please do not use, copy or disclose the information contained in this message 
> or
> in any attachment.  Any views or opinions expressed by the author of this 
> email
> do not necessarily reflect the views of the University of Nottingham.
> 
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Duncan Murdoch
On 10/09/2015 9:11 AM, Thomas Chesney wrote:
> Can anyone suggest a way of counting how frequently sets of values occurs in 
> a data frame? Like table() only with sets.

Do you want 1,2,1 to be the same as 1,1,2, or different?  What about
1,2,2?  For sets, those are all the same, but for most purposes, they
aren't.  If you really want to keep the ordering, then table() does the
counting you want, it just returns it in an ugly format.

Duncan Murdoch


> 
> So for a dataset:
> 
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
> 
> The output would be something like:
> 
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
> 
> Thank you,
> 
> Thomas Chesney
> 
> 
> 
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it. 
> 
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
> 
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting occurrences of a set of values

2015-09-10 Thread Thomas Chesney
Can anyone suggest a way of counting how frequently sets of values occurs in a 
data frame? Like table() only with sets.

So for a dataset:

V1, V2, V3
1, 2, 1
1, 3, 2
1, 2, 1
1, 1, 1

The output would be something like:

1,2,1: 2
1,3,2: 1
1,1,1: 1

Thank you,

Thomas Chesney



This message and any attachment are intended solely for the addressee
and may contain confidential information. If you have received this
message in error, please send it back to me, and immediately delete it. 

Please do not use, copy or disclose the information contained in this
message or in any attachment.  Any views or opinions expressed by the
author of this email do not necessarily reflect the views of the
University of Nottingham.

This message has been checked for viruses but the contents of an
attachment may still contain software viruses which could damage your
computer system, you are advised to perform your own checks. Email
communications with the University of Nottingham may be monitored as
permitted by UK legislation.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pb with date time sequence with period<1sec

2015-09-10 Thread peter dalgaard
Plus, current dates are an awful lot of seconds since 1970-01-01, so the 
relative error on second-scale differences is bigger than you might think:

> as.numeric(Sys.time())
[1] 1441874878

and since relative representation errors are of the order 1e-16, the 
corresponding absolute errors are about 1e-7.

(The examples given forgot to tell us what nom_fich is supposed to be, but I 
assume something relatively current was meant.)

-pd

On 09 Sep 2015, at 16:50 , Sarah Goslee  wrote:

> Looks like R FAQ 7.31 to me.
> 
> https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
> 
> On Wed, Sep 9, 2015 at 4:19 AM, DE  wrote:
>> Hi,
>> 
>> I'd like to create a date-time seq with a period of 0.05 s, over several
>> days.
>> 
>> # try :
>> start<-strptime(nom_fich,format="%y%m%d")
>> time<-seq(from=start, by=0.05, length.out = 86400*20*3)
>> print(as.POSIXlt(time[2])$sec)
>> # result is 0.0495 and not 0.05 as expected
>> 
>> But If I am looking at the sequence, the seconds are not separated by 0.05,
>> but by something very close (0.0495). Same pb if I want to add a
>> fraction of seconds to a date-time object :
>> 
>> # try :
>> start<-strptime(nom_fich,format="%y%m%d")
>> as.POSIXlt(start+0.05,origin="1970-01-01")$sec
>> # result is 0.0495 and not 0.05 as expected
>> 
>> Any idea to solve this pb ?
>> 
>> Thank you in advance !
>> 
>> 
> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.