Re: [R] counting the duplicates in an object of list

2011-09-07 Thread zhenjiang xu
hmm, frustrating. BTW, unique() works alright. It seems not using deparse()
or using it differently

On Wed, Sep 7, 2011 at 11:27 PM, William Dunlap  wrote:

>  I don't think you can increase width.cutoff above 500 and
>
> it isn't an argument to as.character or match.  The best
>
> solution would be to avoid the internal use of deparse
>
> when using match() or unique() on lists and to hash the
>
> list element directly, but that is a fair bit of work.
>
> ** **
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
>
> *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com]
> *Sent:* Wednesday, September 07, 2011 8:04 PM
>
> *To:* William Dunlap
> *Cc:* r-help
> *Subject:* Re: [R] counting the duplicates in an object of list
>
>  ** **
>
> I tried converting the elements to strings before, but due to the large
> data size it took forever to finish with paste(). Is there anyway to set the
> default width.cutoff longer and pass it to match()?
>
> On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap  wrote:
> 
>
> match(aList, aList) probably does what as.character(aList) does:
>
> cut off the character strings at 500 characters (because deparse(x,
>
> nlines=1, width.cutoff) requires that width.cutoff<=500) .  Try
>
> converting the elements to character strings yourself before passing them*
> ***
>
> to match.  E.g.,
>
> ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))
>
> and use match on that.  You can use the indices it returns on
>
> the original list.
>
>  
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
>
> *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com]
> *Sent:* Wednesday, September 07, 2011 7:25 PM
> *To:* William Dunlap
> *Cc:* r-help
> *Subject:* Re: [R] counting the duplicates in an object of list
>
>  
>
> Now I nailed down the problem, but I am still confused why match() takes
> the 1st two components and the last two the same.
>
>  
>
> > match(a,a)
>
> [1] 1 2 3 1 2
>
>  
>
> > a
>
> [[1]]
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-3"
>
> [37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
>  
>
> [43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2"
> "YNL054W-B"
>
> [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
> "YPL257W-B"
>
> [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"
>
>  
>
> [[2]]
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-2"
>
> [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
>  
>

Re: [R] counting the duplicates in an object of list

2011-09-07 Thread zhenjiang xu
I tried converting the elements to strings before, but due to the large data
size it took forever to finish with paste(). Is there anyway to set the
default width.cutoff longer and pass it to match()?

On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap  wrote:

>  match(aList, aList) probably does what as.character(aList) does:
>
> cut off the character strings at 500 characters (because deparse(x,
>
> nlines=1, width.cutoff) requires that width.cutoff<=500) .  Try
>
> converting the elements to character strings yourself before passing them*
> ***
>
> to match.  E.g.,
>
> ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))
>
> and use match on that.  You can use the indices it returns on
>
> the original list.
>
> ** **
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
>
> *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com]
> *Sent:* Wednesday, September 07, 2011 7:25 PM
> *To:* William Dunlap
> *Cc:* r-help
> *Subject:* Re: [R] counting the duplicates in an object of list
>
> ** **
>
> Now I nailed down the problem, but I am still confused why match() takes
> the 1st two components and the last two the same.
>
> ** **
>
> > match(a,a)
>
> [1] 1 2 3 1 2
>
> ** **
>
> > a
>
> [[1]]
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-3"
>
> [37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
>  
>
> [43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2"
> "YNL054W-B"
>
> [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
> "YPL257W-B"
>
> [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"
>
> ** **
>
> [[2]]
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-2"
>
> [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
>  
>
> [43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YNLCTy1-1"
> "YNL284C-B"
>
> [49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2"
> "YOR142W-B"
>
> [55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3"
> "YPR158W-B"
>
> [61] "YPRCTy1-4" "YPR158C-D"
>
> ** **
>
> [[3]]
>
>  [1] "YARCTy1-1" "YAR009C"   "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2"
> "YDR210C-D"
>
>  [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5"
> "YDR365W-B"
>
> [13] "YERCTy1-1" &

Re: [R] how to create data.frames from vectors with duplicates

2011-09-07 Thread zhenjiang xu
Thanks for benchmarking them. data.table is indeed worth looking at.

On Wed, Sep 7, 2011 at 9:55 PM, Dennis Murphy  wrote:

> Hi:
>
> Here are a few informal timings on my machine with the following
> example. The data.table package is worth investigating, particularly
> in problems where its advantages can scale with size.
>
> library(data.table)
> dt <- data.table(x = sample(1:50, 100, replace = TRUE),
>  y = sample(letters[1:26], 100, replace = TRUE),
>  key = 'y')
> system.time(dt[, list(count = sum(x)), by = 'y'])
>   user  system elapsed
>   0.020.000.02
>
> # Data tables are also data frames, so we can use them as such:
>
> system.time(with(dt, tapply(x, y, sum)))
>   user  system elapsed
>   0.390.000.39
> system.time(with(dt, rowsum(x, y)))
>   user  system elapsed
>   0.040.000.03
> system.time(aggregate(x ~ y, data = dt, FUN = sum))
>   user  system elapsed
>   1.870.001.87
>
> So rowsum() is good, but data.table is a little better for this task.
> Increasing the size of the problem is to the advantage of both
> data.table and rowsum(), but tapply() takes a fair bit longer,
> relatively speaking (appx. 10x rowsum() in the first example, 20x in
> the second example). The ratios of rowsum() to data.table are about
> the same (appx. 2x).
>
> # 10M observations, 1000 groups
> > dt <- data.table(x = sample(1:100, 1000, replace = TRUE),
> +  y = sample(1:1000, 1000, replace = TRUE),
> +  key = 'y')
> > system.time(dt[, list(count = sum(x)), by = 'y'])
>   user  system elapsed
>   0.160.030.18
> > system.time(with(dt, rowsum(x, y)))
>   user  system elapsed
>   0.360.040.40
> > system.time(with(dt, tapply(x, y, sum)))
>   user  system elapsed
>   8.770.339.11
>
> HTH,
> Dennis
>
>
> On Wed, Sep 7, 2011 at 6:18 PM, zhenjiang xu 
> wrote:
> > Thanks for all your replies. I am using rowsum() and it looks efficient.
> I
> > hope I could do some benchmark sometime in near future and let people
> know.
> > Or is there any benchmark result available?
> >
> > On Wed, Aug 31, 2011 at 12:58 PM, Bert Gunter  >wrote:
> >
> >> Inline below:
> >>
> >> On Wed, Aug 31, 2011 at 9:50 AM, Jorge I Velez <
> jorgeivanve...@gmail.com>
> >> wrote:
> >> > Hi Zhenjiang,
> >> >
> >> > Try
> >> >
> >> > table(unlist(mapply(function(x, y) rep(x, y), y, x)))
> >>
> >> Yikes! How about simply tapply(x,y,sum) ??
> >> ?tapply
> >>
> >> -- Bert
> >> >
> >> > HTH,
> >> > Jorge
> >> >
> >> >
> >> > On Wed, Aug 31, 2011 at 12:45 PM, zhenjiang xu <> wrote:
> >> >
> >> >> Hi R users,
> >> >>
> >> >> suppose I have two vectors,
> >> >>  > x=c(1,2,3,4,5)
> >> >>  > y=c('a','b','c','a','c')
> >> >> How can I get a data.frame like this?
> >> >> > xy
> >> >>  count
> >> >> a 5
> >> >> b 2
> >> >> c 8
> >> >>
> >> >> I know a few ways to fulfill the task. However, I have a huge number
> >> >> of this kind calculations, so I'd like an efficient solution. Thanks
> >> >>
> >> >> --
> >> >> Best,
> >> >> Zhenjiang
> >> >>
> >> >> __
> >> >> R-help@r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >
> >> >[[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >
> >
> >
> > --
> > Best,
> > Zhenjiang
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the duplicates in an object of list

2011-09-07 Thread zhenjiang xu
Now I nailed down the problem, but I am still confused why match() takes the
1st two components and the last two the same.

> match(a,a)
[1] 1 2 3 1 2

> a
[[1]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" "YNL054W-B"
[49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B"
[55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[2]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B"
[49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"
[61] "YPRCTy1-4" "YPR158C-D"

[[3]]
 [1] "YARCTy1-1" "YAR009C"   "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D"
 [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B"
[13] "YERCTy1-1" "YER138C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[19] "YJRWTy1-1" "YJR027W"   "YJRWTy1-2" "YJR029W"   "YLRCTy1-1" "YLR157C-B"
[25] "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-4"
[31] "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
[37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[4]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[49] "YPLWTy1-1" "YPL257W-B&q

Re: [R] read.table truncated data?

2011-09-07 Thread zhenjiang xu
Thanks for the suggestion. I guess that's the only thing I could do

On Fri, Aug 26, 2011 at 4:22 AM, Petr PIKAL  wrote:

> Hi
>
> >
> > Thanks, Jim. quote='' works. And then I found a single quote in each of
> > these lines:
> > 3262
> > 10403
> > 17544
> > 24685
> > 31826
> > 38967
> >
> > None of them near the position the table got truncated. Why is it?
> >
> > And read.table is a great function. Is it possible for it to give a
> warning
> > message when the data gets truncated? In my case I almost looked over
> the
> > truncation...
>
> When I read in some big data I usually do
>
> str(data)
>
> which tells me if there is some problem with data types (conversion of
> numeric to factor due to any problematic item)
>
> and/or
>
> dim(data)
>
> to see that size is as expected.
>
> Regards
> Petr
>
> >
> > On Thu, Aug 25, 2011 at 11:57 AM, jim holtman 
> wrote:
> >
> > > But did you try the following:
> > >
> > > x <- read.table(, comment.char = '', quote = '')
> > >
> > > Most cases is that there is a missing quote somewhere in your data.
> > > use a text editor and search for single and double quotes.
> > >
> > > On Thu, Aug 25, 2011 at 11:49 AM, zhenjiang xu
> 
> > > wrote:
> > > > Thanks for your replies. I looked at those lines and didn't spot
> anything
> > > > unusual.
> > > >
> > > >> tail(a)
> > > >test_id gene_id gene   locus sample_1 sample_2
> status
> > > > 21418 tY(GUA)J1   - SUP7 chr10:354243-354332 air1rrp6 air2rrp6
> OK
> > > > 21419 tY(GUA)J2   - SUP4 chr10:542955-543044 air1rrp6 air2rrp6
> OK
> > > > 21420 tY(GUA)M1   - SUP5 chr13:168794-168883 air1rrp6 air2rrp6
> OK
> > > > 21421 tY(GUA)M2   - SUP8 chr13:837927-838016 air1rrp6 air2rrp6
> OK
> > > > 21422  tY(GUA)O   - SUP3 chr15:288191-288280 air1rrp6 air2rrp6
> OK
> > > > 21423  tY(GUA)Q   --   chrmt:70823-70907 air1rrp6 air2rrp6
> > > OK
> > > >  value_1 value_2 ln.fold_change. test_stat  p_value  q_value
> > > > significant
> > > > 21418 0.0  0.0.00   0.0 1.00 1.011650
> > > >  no
> > > > 21419 0.0  0.0.00   0.0 1.00 1.011480
> > > >  no
> > > > 21420 0.0  0.0.00   0.0 1.00 1.011500
> > > >  no
> > > > 21421 0.0  0.0.00   0.0 1.00 1.011520
> > > >  no
> > > > 21422 0.0  0.0.00   0.0 1.00 1.011550
> > > >  no
> > > > 21423 6.68356 10.73970.474301  -1.08614 0.277417 0.455917
> > > >  no
> > > >
> > > >
> > > > tY(GUA)J1   -   SUP7chr10:354243-354332 rrp6
> air1rrp6
> > > >   OK  0   0   0   0   11.00404  no
> > > > tY(GUA)J2   -   SUP4chr10:542955-543044 rrp6
> air1rrp6
> > > >   OK  0   0   0   0   11.00497  no
> > > > tY(GUA)M1   -   SUP5chr13:168794-168883 rrp6
> air1rrp6
> > > >   OK  0   0   0   0   11.00492  no
> > > > tY(GUA)M2   -   SUP8chr13:837927-838016 rrp6
> air1rrp6
> > > >   OK  0   0   0   0   11.00488  no
> > > > tY(GUA)O-   SUP3chr15:288191-288280 rrp6
> air1rrp6
> > > >   OK  0   0   0   0   11.00485  no
> > > > tY(GUA)Q-   -   chrmt:70823-70907   rrp6
> air1rrp6
> > > >   OK  4.49644 6.68356 0.396365-0.766052 0.443645
> > > >  0.634724no
> > > > 15S_rRNA-   15S_RRNAchrmt:6545-8194 WT air2rrp6
> > > >   OK  2288.88 711.697 -1.168172.78772   0.00530801
> > > >  0.0167772   yes
> > > > 21S_rRNA-   21S_RRNAchrmt:58008-62447   WT
> > > >  air2rrp6OK  4134.59 1927.04 -0.7634 1.58991 0.111855
> > > >   0.22339 no
> > > > ETS1-1  -   ETS1-1  chr12:457732-458432 WT  air2rrp6
> > >  OK
> > > >   3258.97 1114.76 -1.072772.91211 0.00359   0.0121587
> > > yes
> > > > ETS1-2  -   ETS1-2  chr12:466869-467569 WT  air2rrp6
> > > 

Re: [R] how to create data.frames from vectors with duplicates

2011-09-07 Thread zhenjiang xu
Thanks for all your replies. I am using rowsum() and it looks efficient. I
hope I could do some benchmark sometime in near future and let people know.
Or is there any benchmark result available?

On Wed, Aug 31, 2011 at 12:58 PM, Bert Gunter wrote:

> Inline below:
>
> On Wed, Aug 31, 2011 at 9:50 AM, Jorge I Velez 
> wrote:
> > Hi Zhenjiang,
> >
> > Try
> >
> > table(unlist(mapply(function(x, y) rep(x, y), y, x)))
>
> Yikes! How about simply tapply(x,y,sum) ??
> ?tapply
>
> -- Bert
> >
> > HTH,
> > Jorge
> >
> >
> > On Wed, Aug 31, 2011 at 12:45 PM, zhenjiang xu <> wrote:
> >
> >> Hi R users,
> >>
> >> suppose I have two vectors,
> >>  > x=c(1,2,3,4,5)
> >>  > y=c('a','b','c','a','c')
> >> How can I get a data.frame like this?
> >> > xy
> >>  count
> >> a 5
> >> b 2
> >> c 8
> >>
> >> I know a few ways to fulfill the task. However, I have a huge number
> >> of this kind calculations, so I'd like an efficient solution. Thanks
> >>
> >> --
> >> Best,
> >> Zhenjiang
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the duplicates in an object of list

2011-09-07 Thread zhenjiang xu
Thanks, Bill. match() is nice and efficient. However, I met a problem:

My real data is a large _list_ named "read.genes". I found conflict results
between match() and unique() - the lengths of the outcomes are different
(and my final result are wrong too). I suspect that some different list
components are regarded as the same when they are converted to vectors (the
r-help of match() says "Factors, raw vectors and lists are converted to
character vectors"). Is it possible? And as important, how to fix this?

> read.genes[[1]]
[1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"

> duplicates <- as.vector(table(match(read.genes, read.genes)))

> length(duplicates)
[1] 1424
> read.genes.uniq <- unique(read.genes)
> length(read.genes.uniq)
[1] 1469

> sum(duplicates)
[1] 9945348
> length(read.genes)
[1] 9945348

On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap  wrote:

> table(match(x, x)) gives you the numbers but the labels are
> a bit more work.
>
> E.g., I'll define another list
>  > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4))
>  > tb <- table(m <- match(x, x))
>  > m
>  [1] 1 1 3 4 3
>  > tb
>
>  1 3 4
>  2 2 1
> which says that the first element of x is seen twice,
> the third twice, and the fourth once.  How to organize
> that the best depends on what you want to do with the
> data.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of zhenjiang xu
> > Sent: Wednesday, August 31, 2011 9:25 AM
> > To: r-help
> > Subject: [R] counting the duplicates in an object of list
> >
> > Hi all,
> >
> > I have a list x:
> >
> >  > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3'))
> >
> > I can get the unique elements with unique(), but how can I get the
> > number of duplicates for each unique elements?
> >
> > > unique(x)
> > [[1]]
> > [1] "1" "2"
> >
> > [[2]]
> > [1] "2" "3"
> >
> > Thanks
> >
> > --
> > Best,
> > Zhenjiang
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to create data.frames from vectors with duplicates

2011-08-31 Thread zhenjiang xu
Hi R users,

suppose I have two vectors,
 > x=c(1,2,3,4,5)
 > y=c('a','b','c','a','c')
How can I get a data.frame like this?
> xy
  count
a 5
b 2
c 8

I know a few ways to fulfill the task. However, I have a huge number
of this kind calculations, so I'd like an efficient solution. Thanks

-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sum of two lists

2011-08-31 Thread zhenjiang xu
Thanks, Henrique. It works.

On Mon, Aug 29, 2011 at 6:45 PM, Henrique Dallazuanna  wrote:
> Try this:
> as.list(colSums(merge(m, n, all = TRUE), na.rm = TRUE))
>
> On Mon, Aug 29, 2011 at 7:39 PM, zhenjiang xu 
> wrote:
>>
>> Hi R users,
>>
>> Suppose I have two lists and the names of list 'm' are a subset of those
>> of
>> 'n', how can I sum the two lists with corresponding elements added
>> together
>> to get list 'o'?
>>
>> > n = list("a"=1,"b"=3,"c"=5)
>> > m = list('b'=4)
>> > o
>> $a
>> [1] 1
>>
>> $b
>> [1] 7
>>
>> $c
>> [1] 5
>>
>> Thanks
>>
>> --
>> Best,
>> Zhenjiang
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] counting the duplicates in an object of list

2011-08-31 Thread zhenjiang xu
Hi all,

I have a list x:

 > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3'))

I can get the unique elements with unique(), but how can I get the
number of duplicates for each unique elements?

> unique(x)
[[1]]
[1] "1" "2"

[[2]]
[1] "2" "3"

Thanks

--
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sum of two lists

2011-08-29 Thread zhenjiang xu
Hi R users,

Suppose I have two lists and the names of list 'm' are a subset of those of
'n', how can I sum the two lists with corresponding elements added together
to get list 'o'?

> n = list("a"=1,"b"=3,"c"=5)
> m = list('b'=4)
> o
$a
[1] 1

$b
[1] 7

$c
[1] 5

Thanks

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table truncated data?

2011-08-25 Thread zhenjiang xu
Thanks, Jim. quote='' works. And then I found a single quote in each of
these lines:
3262
10403
17544
24685
31826
38967

None of them near the position the table got truncated. Why is it?

And read.table is a great function. Is it possible for it to give a warning
message when the data gets truncated? In my case I almost looked over the
truncation...

On Thu, Aug 25, 2011 at 11:57 AM, jim holtman  wrote:

> But did you try the following:
>
> x <- read.table(, comment.char = '', quote = '')
>
> Most cases is that there is a missing quote somewhere in your data.
> use a text editor and search for single and double quotes.
>
> On Thu, Aug 25, 2011 at 11:49 AM, zhenjiang xu 
> wrote:
> > Thanks for your replies. I looked at those lines and didn't spot anything
> > unusual.
> >
> >> tail(a)
> >test_id gene_id gene   locus sample_1 sample_2 status
> > 21418 tY(GUA)J1   - SUP7 chr10:354243-354332 air1rrp6 air2rrp6 OK
> > 21419 tY(GUA)J2   - SUP4 chr10:542955-543044 air1rrp6 air2rrp6 OK
> > 21420 tY(GUA)M1   - SUP5 chr13:168794-168883 air1rrp6 air2rrp6 OK
> > 21421 tY(GUA)M2   - SUP8 chr13:837927-838016 air1rrp6 air2rrp6 OK
> > 21422  tY(GUA)O   - SUP3 chr15:288191-288280 air1rrp6 air2rrp6 OK
> > 21423  tY(GUA)Q   --   chrmt:70823-70907 air1rrp6 air2rrp6
> OK
> >  value_1 value_2 ln.fold_change. test_stat  p_value  q_value
> > significant
> > 21418 0.0  0.0.00   0.0 1.00 1.011650
> >  no
> > 21419 0.0  0.0.00   0.0 1.00 1.011480
> >  no
> > 21420 0.0  0.0.00   0.0 1.00 1.011500
> >  no
> > 21421 0.0  0.0.00   0.0 1.00 1.011520
> >  no
> > 21422 0.0  0.0.00   0.0 1.00 1.011550
> >  no
> > 21423 6.68356 10.73970.474301  -1.08614 0.277417 0.455917
> >  no
> >
> >
> > tY(GUA)J1   -   SUP7chr10:354243-354332 rrp6air1rrp6
> >   OK  0   0   0   0   11.00404  no
> > tY(GUA)J2   -   SUP4chr10:542955-543044 rrp6air1rrp6
> >   OK  0   0   0   0   11.00497  no
> > tY(GUA)M1   -   SUP5chr13:168794-168883 rrp6air1rrp6
> >   OK  0   0   0   0   11.00492  no
> > tY(GUA)M2   -   SUP8chr13:837927-838016 rrp6air1rrp6
> >   OK  0   0   0   0   11.00488  no
> > tY(GUA)O-   SUP3chr15:288191-288280 rrp6air1rrp6
> >   OK  0   0   0   0   11.00485  no
> > tY(GUA)Q-   -   chrmt:70823-70907   rrp6air1rrp6
> >   OK  4.49644 6.68356 0.396365-0.766052 0.443645
> >  0.634724no
> > 15S_rRNA-   15S_RRNAchrmt:6545-8194 WT  air2rrp6
> >   OK  2288.88 711.697 -1.168172.78772   0.00530801
> >  0.0167772   yes
> > 21S_rRNA-   21S_RRNAchrmt:58008-62447   WT
> >  air2rrp6OK  4134.59 1927.04 -0.7634 1.58991   0.111855
> >   0.22339 no
> > ETS1-1  -   ETS1-1  chr12:457732-458432 WT  air2rrp6
>  OK
> >   3258.97 1114.76 -1.072772.91211 0.00359   0.0121587
> yes
> > ETS1-2  -   ETS1-2  chr12:466869-467569 WT  air2rrp6
>  OK
> >   3258.97 1114.76 -1.072772.91211 0.00359   0.0121597
> yes
> >
> >
> > On Wed, Aug 24, 2011 at 2:34 PM, Sarah Goslee  >wrote:
> >
> >> Hi,
> >>
> >> On Wed, Aug 24, 2011 at 2:18 PM, zhenjiang xu 
> >> wrote:
> >> > Hi R users,
> >> >
> >> > I was using read.table to read a file. The data.fame looked alright,
> but
> >> I
> >> > found not all rows are read by the read.table. What's wrong with it?
> It
> >> > didn't give me any warning or error messages. Why the data are
> truncated?
> >> > Thanks.
> >> >
> >> > $ wc -l all/isoform_exp.diff
> >> > 42847 all/isoform_exp.diff
> >> >
> >> >> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
> >> >> nrow(a)
> >> > [1] 21423
> >>
> >> This is a common problem. You need to take a look at the last row that
> >> was imported, and the rows around 21423 in the original file.
> >>
> >> Common causes include stray single or double quotation marks, and
> >> other sp

Re: [R] read.table truncated data?

2011-08-25 Thread zhenjiang xu
Thanks for your replies. I looked at those lines and didn't spot anything
unusual.

> tail(a)
test_id gene_id gene   locus sample_1 sample_2 status
21418 tY(GUA)J1   - SUP7 chr10:354243-354332 air1rrp6 air2rrp6 OK
21419 tY(GUA)J2   - SUP4 chr10:542955-543044 air1rrp6 air2rrp6 OK
21420 tY(GUA)M1   - SUP5 chr13:168794-168883 air1rrp6 air2rrp6 OK
21421 tY(GUA)M2   - SUP8 chr13:837927-838016 air1rrp6 air2rrp6 OK
21422  tY(GUA)O   - SUP3 chr15:288191-288280 air1rrp6 air2rrp6 OK
21423  tY(GUA)Q   --   chrmt:70823-70907 air1rrp6 air2rrp6 OK
  value_1 value_2 ln.fold_change. test_stat  p_value  q_value
significant
21418 0.0  0.0.00   0.0 1.00 1.011650
 no
21419 0.0  0.0.00   0.0 1.00 1.011480
 no
21420 0.0  0.0.00   0.0 1.00 1.011500
 no
21421 0.0  0.0.00   0.0 1.00 1.011520
 no
21422 0.0  0.0.00   0.0 1.00 1.011550
 no
21423 6.68356 10.73970.474301  -1.08614 0.277417 0.455917
 no


tY(GUA)J1   -   SUP7chr10:354243-354332 rrp6air1rrp6
   OK  0   0   0   0   11.00404  no
tY(GUA)J2   -   SUP4chr10:542955-543044 rrp6air1rrp6
   OK  0   0   0   0   11.00497  no
tY(GUA)M1   -   SUP5chr13:168794-168883 rrp6air1rrp6
   OK  0   0   0   0   11.00492  no
tY(GUA)M2   -   SUP8chr13:837927-838016 rrp6air1rrp6
   OK  0   0   0   0   11.00488  no
tY(GUA)O-   SUP3chr15:288191-288280 rrp6air1rrp6
   OK  0   0   0   0   11.00485  no
tY(GUA)Q-   -   chrmt:70823-70907   rrp6air1rrp6
   OK  4.49644 6.68356 0.396365-0.766052 0.443645
 0.634724no
15S_rRNA-   15S_RRNAchrmt:6545-8194 WT  air2rrp6
   OK  2288.88 711.697 -1.168172.78772   0.00530801
 0.0167772   yes
21S_rRNA-   21S_RRNAchrmt:58008-62447   WT
 air2rrp6OK  4134.59 1927.04 -0.7634 1.58991   0.111855
   0.22339 no
ETS1-1  -   ETS1-1  chr12:457732-458432 WT  air2rrp6OK
   3258.97 1114.76 -1.072772.91211 0.00359   0.0121587   yes
ETS1-2  -   ETS1-2  chr12:466869-467569 WT  air2rrp6OK
   3258.97 1114.76 -1.072772.91211 0.00359   0.0121597   yes


On Wed, Aug 24, 2011 at 2:34 PM, Sarah Goslee wrote:

> Hi,
>
> On Wed, Aug 24, 2011 at 2:18 PM, zhenjiang xu 
> wrote:
> > Hi R users,
> >
> > I was using read.table to read a file. The data.fame looked alright, but
> I
> > found not all rows are read by the read.table. What's wrong with it? It
> > didn't give me any warning or error messages. Why the data are truncated?
> > Thanks.
> >
> > $ wc -l all/isoform_exp.diff
> > 42847 all/isoform_exp.diff
> >
> >> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
> >> nrow(a)
> > [1] 21423
>
> This is a common problem. You need to take a look at the last row that
> was imported, and the rows around 21423 in the original file.
>
> Common causes include stray single or double quotation marks, and
> other special characters in your file like the default comment.char #
>
> Sarah
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table truncated data?

2011-08-24 Thread zhenjiang xu
Hi R users,

I was using read.table to read a file. The data.fame looked alright, but I
found not all rows are read by the read.table. What's wrong with it? It
didn't give me any warning or error messages. Why the data are truncated?
Thanks.

$ wc -l all/isoform_exp.diff
42847 all/isoform_exp.diff

> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
> nrow(a)
[1] 21423

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on list manipulation

2011-08-06 Thread zhenjiang xu
Unfortunately the list names of my real data are irregular with mixed
digit and letters at the end. This is good idea though. It inspired me
to give another solution based on that:

> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d","g"))
> tmp <- unlist(x, use.names=F)
> a = unlist(lapply(x, length))
> tmp2 = rep(names(a), a)
> x.new = split(tmp2, tmp)

And I tested it on my data. It took over an hour using for loops while
finishing in a second with the vectorization. Thanks all of you.
Hooray~


On Fri, Aug 5, 2011 at 3:31 PM, Greg Snow  wrote:
> Here is one approach, whether it is better than the basic loop or not is up 
> to you:
>
>> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d"))
>>
>> tmp <- unlist(x)
>> tmp2 <- sub( '[0-9]+$', '', names(tmp) )
>>
>> x.new <- split( tmp2, tmp )
>> x.new
> $d
> [1] "A" "B" "C"
>
> $e
> [1] "A" "B"
>
> $f
> [1] "A"
>
>
> Of course this version will have some problems if the names of your list 
> elements end with digits that you don't want stripped off (but you can work 
> around that by preprocessing the list names).
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
>> project.org] On Behalf Of zhenjiang xu
>> Sent: Friday, August 05, 2011 11:04 AM
>> To: Duncan Murdoch
>> Cc: r-help
>> Subject: Re: [R] a question on list manipulation
>>
>> Exactly! Sorry I get others misunderstood. The uppercase/lowercase is
>> only a toy example (and a bad one; yours is better than mine). My
>> question is a more general one: a list is basically a one-to-many
>> matching, from the names of a list to the elements belonging to each
>> name. I'd like to reverse the matching, from all the elements to the
>> names of the list.
>>
>> On Fri, Aug 5, 2011 at 12:53 PM, Duncan Murdoch
>>  wrote:
>> > On 05/08/2011 12:05 PM, zhenjiang xu wrote:
>> >>
>> >> Hi R users,
>> >>
>> >> I have a list:
>> >> >  x
>> >> $A
>> >> [1] "a"  "b"  "c"
>> >> $B
>> >> [1] "b"  "c"
>> >> $C
>> >> [1] "c"
>> >>
>> >> I want to convert it to a lowercase-to-uppercase list like this:
>> >> >  y
>> >> $a
>> >> [1] "A"
>> >> $b
>> >> [1] "A"  "B"
>> >> $c
>> >> [1] "A"  "B"  "C"
>> >>
>> >> In a word, I want to reverse the list names and the elements under
>> >> each list name. Is there any quick way to do that? Thanks
>> >
>> > I interpreted this question differently from the others, and your
>> example is
>> > ambiguous as to which is the right interpretation.  I thought you
>> wanted to
>> > swap names and elements,  so
>> >
>> >> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d"))
>> >> x
>> > $A
>> > [1] "d" "e" "f"
>> >
>> > $B
>> > [1] "d" "e"
>> >
>> > $C
>> > [1] "d"
>> >
>> > would become
>> >
>> >> list(d=c("A", "B", "C"), e=c("A", "B"), f="A")
>> > $d
>> > [1] "A" "B" "C"
>> >
>> > $e
>> > [1] "A" "B"
>> >
>> > $f
>> > [1] "A"
>> >
>> > I don't know a slick way to do this; I'd just do it by brute force,
>> looping
>> > over the names of x.
>> >
>> > Duncan Murdoch
>> >
>>
>>
>>
>> --
>> Best,
>> Zhenjiang
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on list manipulation

2011-08-06 Thread zhenjiang xu
This is a nice solution. Thanks, Dennis. But I am afraid if the length
of the list x isn't equal to the length of x2, there will be errors
since lapply returns a list of the same length.

> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d","g"))
> x2 <- unique(unlist(x))
> w <- lapply(x, function(u) names(x)[which(x2 %in% u)])
> names(w) <- x2
Error in names(w) <- x2 :
  'names' attribute [4] must be the same length as the vector [3]


On Fri, Aug 5, 2011 at 3:23 PM, Dennis Murphy  wrote:
> Hi:
>
> Your clarification suggests Duncan was on the right track, so how about this:
>
> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d"))
> x2 <- unique(unlist(x))
> w <- lapply(x, function(u) names(x)[which(x2 %in% u)])
> names(w) <- x2
> w
> $d
> [1] "A" "B" "C"
>
> $e
> [1] "A" "B"
>
> $f
> [1] "A"
>
> HTH,
> Dennis
>
> On Fri, Aug 5, 2011 at 10:04 AM, zhenjiang xu  wrote:
>> Exactly! Sorry I get others misunderstood. The uppercase/lowercase is
>> only a toy example (and a bad one; yours is better than mine). My
>> question is a more general one: a list is basically a one-to-many
>> matching, from the names of a list to the elements belonging to each
>> name. I'd like to reverse the matching, from all the elements to the
>> names of the list.
>>
>> On Fri, Aug 5, 2011 at 12:53 PM, Duncan Murdoch
>>  wrote:
>>> On 05/08/2011 12:05 PM, zhenjiang xu wrote:
>>>>
>>>> Hi R users,
>>>>
>>>> I have a list:
>>>> >  x
>>>> $A
>>>> [1] "a"  "b"  "c"
>>>> $B
>>>> [1] "b"  "c"
>>>> $C
>>>> [1] "c"
>>>>
>>>> I want to convert it to a lowercase-to-uppercase list like this:
>>>> >  y
>>>> $a
>>>> [1] "A"
>>>> $b
>>>> [1] "A"  "B"
>>>> $c
>>>> [1] "A"  "B"  "C"
>>>>
>>>> In a word, I want to reverse the list names and the elements under
>>>> each list name. Is there any quick way to do that? Thanks
>>>
>>> I interpreted this question differently from the others, and your example is
>>> ambiguous as to which is the right interpretation.  I thought you wanted to
>>> swap names and elements,  so
>>>
>>>> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d"))
>>>> x
>>> $A
>>> [1] "d" "e" "f"
>>>
>>> $B
>>> [1] "d" "e"
>>>
>>> $C
>>> [1] "d"
>>>
>>> would become
>>>
>>>> list(d=c("A", "B", "C"), e=c("A", "B"), f="A")
>>> $d
>>> [1] "A" "B" "C"
>>>
>>> $e
>>> [1] "A" "B"
>>>
>>> $f
>>> [1] "A"
>>>
>>> I don't know a slick way to do this; I'd just do it by brute force, looping
>>> over the names of x.
>>>
>>> Duncan Murdoch
>>>
>>
>>
>>
>> --
>> Best,
>> Zhenjiang
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on list manipulation

2011-08-05 Thread zhenjiang xu
Exactly! Sorry I get others misunderstood. The uppercase/lowercase is
only a toy example (and a bad one; yours is better than mine). My
question is a more general one: a list is basically a one-to-many
matching, from the names of a list to the elements belonging to each
name. I'd like to reverse the matching, from all the elements to the
names of the list.

On Fri, Aug 5, 2011 at 12:53 PM, Duncan Murdoch
 wrote:
> On 05/08/2011 12:05 PM, zhenjiang xu wrote:
>>
>> Hi R users,
>>
>> I have a list:
>> >  x
>> $A
>> [1] "a"  "b"  "c"
>> $B
>> [1] "b"  "c"
>> $C
>> [1] "c"
>>
>> I want to convert it to a lowercase-to-uppercase list like this:
>> >  y
>> $a
>> [1] "A"
>> $b
>> [1] "A"  "B"
>> $c
>> [1] "A"  "B"  "C"
>>
>> In a word, I want to reverse the list names and the elements under
>> each list name. Is there any quick way to do that? Thanks
>
> I interpreted this question differently from the others, and your example is
> ambiguous as to which is the right interpretation.  I thought you wanted to
> swap names and elements,  so
>
>> x <- list(A=c("d", "e", "f"), B=c("d", "e"), C=c("d"))
>> x
> $A
> [1] "d" "e" "f"
>
> $B
> [1] "d" "e"
>
> $C
> [1] "d"
>
> would become
>
>> list(d=c("A", "B", "C"), e=c("A", "B"), f="A")
> $d
> [1] "A" "B" "C"
>
> $e
> [1] "A" "B"
>
> $f
> [1] "A"
>
> I don't know a slick way to do this; I'd just do it by brute force, looping
> over the names of x.
>
> Duncan Murdoch
>



-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to control to save plots to which dev

2011-08-05 Thread zhenjiang xu
Yes, but I thought the parameter to dev.set() should only be the value
returned by dev.next()/dev.prev(). So I read the help page again. It's a
little embarrassing - I missed the sentence "Devices are associated
with ... a number in the range 1 to 63". I should have read the help
page more carefully. Thanks.

On Fri, Aug 5, 2011 at 12:02 PM, Duncan Murdoch
 wrote:
> On 05/08/2011 11:49 AM, zhenjiang xu wrote:
>>
>> Thanks, Prof Ripley. I was using dev.next(), dev.prev(),, but I am
>> wondering, instead of switching the current dev, is there a way to
>> more directly print plot A into file connection A, plot B into file
>> connection B...? Because if coding with more then two dev
>> simultaniously, one could easily get confused which dev is the current
>> one.
>
> dev.set() will do exactly that (and Prof. Ripley did point you to it).
>
> Duncan Murdoch
>>
>> On Tue, Aug 2, 2011 at 1:28 AM, Prof Brian Ripley
>>  wrote:
>> >  On Tue, 2 Aug 2011, David Winsemius wrote:
>> >
>> >>
>> >>  On Aug 1, 2011, at 11:14 PM, zhenjiang xu wrote:
>> >>
>> >>>  Hi,
>> >>>
>> >>>  I have a for loop to make 2 types of plots and I'd like to save one
>> >>>  type of plots to a pdf file and the other to another pdf file. How
>> >>> can
>> >>>  I control which plot will be saved to which pdf? Thanks
>> >>
>> >>  Why not give them file names that identify the type?
>> >
>> >  I think he wants
>> >
>> >  pdf("a.pdf")
>> >  pdf("b.pdf")
>> >  for(i in 1:n) {
>> >  plot something on a.pdf
>> >  plot something on b.pdf
>> >  }
>> >
>> >  This is done using dev.prev/dev.next/dev.set: see their help for
>> > details.
>> >
>> >>
>> >>  --
>> >>
>> >>  David Winsemius, MD
>> >>  West Hartford, CT
>> >>
>> >>  __
>> >>  R-help@r-project.org mailing list
>> >>  https://stat.ethz.ch/mailman/listinfo/r-help
>> >>  PLEASE do read the posting guide
>> >>  http://www.R-project.org/posting-guide.html
>> >>  and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >  --
>> >  Brian D. Ripley,                  rip...@stats.ox.ac.uk
>> >  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> >  University of Oxford,             Tel:  +44 1865 272861 (self)
>> >  1 South Parks Road,                     +44 1865 272866 (PA)
>> >  Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>> >
>>
>>
>>
>
>



-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a question on list manipulation

2011-08-05 Thread zhenjiang xu
Hi R users,

I have a list:
> x
$A
[1] "a"  "b"  "c"
$B
[1] "b"  "c"
$C
[1] "c"

I want to convert it to a lowercase-to-uppercase list like this:
> y
$a
[1] "A"
$b
[1] "A"  "B"
$c
[1] "A"  "B"  "C"

In a word, I want to reverse the list names and the elements under
each list name. Is there any quick way to do that? Thanks
-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to control to save plots to which dev

2011-08-05 Thread zhenjiang xu
Thanks, Prof Ripley. I was using dev.next(), dev.prev(),, but I am
wondering, instead of switching the current dev, is there a way to
more directly print plot A into file connection A, plot B into file
connection B...? Because if coding with more then two dev
simultaniously, one could easily get confused which dev is the current
one.

On Tue, Aug 2, 2011 at 1:28 AM, Prof Brian Ripley  wrote:
> On Tue, 2 Aug 2011, David Winsemius wrote:
>
>>
>> On Aug 1, 2011, at 11:14 PM, zhenjiang xu wrote:
>>
>>> Hi,
>>>
>>> I have a for loop to make 2 types of plots and I'd like to save one
>>> type of plots to a pdf file and the other to another pdf file. How can
>>> I control which plot will be saved to which pdf? Thanks
>>
>> Why not give them file names that identify the type?
>
> I think he wants
>
> pdf("a.pdf")
> pdf("b.pdf")
> for(i in 1:n) {
> plot something on a.pdf
> plot something on b.pdf
> }
>
> This is done using dev.prev/dev.next/dev.set: see their help for details.
>
>>
>> --
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley,                  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to control to save plots to which dev

2011-08-01 Thread zhenjiang xu
Hi,

I have a for loop to make 2 types of plots and I'd like to save one
type of plots to a pdf file and the other to another pdf file. How can
I control which plot will be saved to which pdf? Thanks

-- 
Best,
Zhenjiang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add two data.frame with the same column but different row numbers

2011-04-15 Thread zhenjiang xu
Thanks, Gabor. It's a nice workaround. I'll look more at zoo library.

On Fri, Apr 15, 2011 at 7:10 PM, Gabor Grothendieck  wrote:

> On Fri, Apr 15, 2011 at 6:10 PM, zhenjiang xu 
> wrote:
> > Thanks, Dennis! I'll go with it. It's surprising there is no ready way to
> do
> > that. I imagine it should be a common data manipulation to add two
> > data.frame from two different sources. It could happen that one
> data.frame
> > is missing some rows while the other have some more.
> >
>
> If you represent them as zoo series then you can do it using +
> (although the definition of + is different than in your post).   Here
> "a", "b" and "c" are the "times":
>
> library(zoo)
> a <- zoo(1:3, letters[1:3])
> b <- zoo(c(6, 1), c("a", "c"))
> a+b
>
> The last line gives:
>
> > a+b
> a c
> 7 4
>
> To use the definition in your post one could do this (which has the
> effect of modifying b so that a+b works as in your post):
>
> merge(a, b, fill = 0, retclass = NULL)
> a+b
>
> The last line gives:
>
> > a+b
> a b c
> 7 2 4
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add two data.frame with the same column but different row numbers

2011-04-15 Thread zhenjiang xu
Thanks, Dennis! I'll go with it. It's surprising there is no ready way to do
that. I imagine it should be a common data manipulation to add two
data.frame from two different sources. It could happen that one data.frame
is missing some rows while the other have some more.

On Fri, Apr 15, 2011 at 5:10 PM, Dennis Murphy  wrote:

> Hi:
>
> Here's one approach:
>
> > df1 <- data.frame(x = letters[1:3], y = 1:3)
> > df2 <- data.frame(x = c('a', 'c'), z = c(6, 1))
> > dfm <- merge(df1, df2, all.x = TRUE)
> > dfm
>  x y  z
> 1 a 1  6
> 2 b 2 NA
> 3 c 3  1
> sumdf <- data.frame(x = dfm$x, y = rowSums(dfm[, -1], na.rm = TRUE))
>  x y
> 1 a 7
> 2 b 2
> 3 c 4
>
> HTH,
> Dennis
>
> On Fri, Apr 15, 2011 at 1:31 PM, zhenjiang xu 
> wrote:
> > Hi all,
> >
> > Suppose I have 2 data.frame , a and b, how can I add them together to get
> c?
> > Thanks
> >> a
> >  A
> > a 1
> > b 2
> > c 3
> >
> >> b
> >  A
> > a 6
> > c 1
> >
> >> c
> >   A
> > a 7
> > b 2
> > c 4
> >
> > --
> > Best,
> > Zhenjiang
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to add two data.frame with the same column but different row numbers

2011-04-15 Thread zhenjiang xu
Hi all,

Suppose I have 2 data.frame , a and b, how can I add them together to get c?
Thanks
> a
  A
a 1
b 2
c 3

> b
  A
a 6
c 1

> c
   A
a 7
b 2
c 4

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to reshape the data.frame from long to wide in a specific order

2011-03-14 Thread zhenjiang xu
Hi,

For example, the data.frame like:

origdata.long <- read.table(header=T, con <- textConnection('
 subject sex condition measurement
   1   M   control 7.9
   1   M first12.3
   1   Msecond10.7
   2   F   control 6.3
   2   F first10.6
   2   Fsecond11.1
   3   F   control 9.5
   3   F first13.1
   3   Fsecond13.8
   4   M   control11.5
   4   M first13.4
   4   Msecond12.9
 '))
close(con)

Given a vector c('first', 'second', 'control), how can I reshape the
data.frame to this?
# subject sex  first second   control
#   1   M  12.3  10.7 7.9
#   2   F  10.6  11.1 6.3
#   3   F  13.1  13.8 9.5
#   4   M  13.4  12.911.5

I know reshape() can transform the data.frame from long to wide, but it
seems not able to control the order of the columns.

Thanks ahead of time
-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 problem in interacting mode

2010-11-10 Thread zhenjiang xu
Hi all,

When running R interactively, I have the problem as following:

> library(ggplot2)
Loading required package: reshape
Loading required package: plyr

Attaching package: 'reshape'

The following object(s) are masked from 'package:plyr':

round_any

Loading required package: grid
Loading required package: proto

> data(VADeaths)
> pg <- ggplot(melt(VADeaths), aes(value, X1)) + geom_point() +
+ facet_wrap(~X2) + ylab("")
> print(pg)
Error in get("transform", env = ., inherits = TRUE)(., ...) :
  attempt to apply non-function

My R package information is :
> library(plyr)
> sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=zh_CN.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] lattice_0.18-8 ggplot2_0.8.8  proto_0.3-8reshape_0.8.3  plyr_1.2.1


loaded via a namespace (and not attached):
[1] tools_2.11.1


The interesting thing is that when I put the codes into an R script, and run
with command "R CMD BATCH XX.R", it works alright. Does anyone have any idea
what the problem is? Thanks~
-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error bars in lattice barchart

2010-11-10 Thread zhenjiang xu
Hi all,

I've read the emails of Dan, Deepayan and Sundar about adding error bars to
the lattice plots (
https://stat.ethz.ch/pipermail/r-help/2006-October/114883.html), but I still
have the problem when I want to adding error bars to barchart. I tried both
the solution of Deepayan and Sundar but without luck. Here is my code (I
changed prepanel.ci and panel.ci a little to plot bars vertically):


## Sundar's solution ###
prepanel.ci <- function(x, y, ly, uy, subscripts, ...) {
y <- as.numeric(y)
ly <- as.numeric(ly[subscripts])
uy <- as.numeric(uy[subscripts])
list(ylim = range(y, uy, ly, finite = TRUE))
}

panel.ci <- function(x, y, ly, uy, subscripts,
  groups = NULL, pch = 16, ...) {
   x <- as.numeric(x)
   y <- as.numeric(y)
   ly <- as.numeric(ly[subscripts])
   uy <- as.numeric(uy[subscripts])
   par <- if(is.null(groups))"plot.symbol" else "superpose.symbol"
   sym <- trellis.par.get(par)
   col <- sym$col
   groups <- if(!is.null(groups)) {
 groups[subscripts]
   } else {
 rep(1, along = x)
   }
   ug <- unique(groups)
   for(i in seq(along = ug)) {
 subg <- groups == ug[i]
 y.g <- y[subg]
 x.g <- x[subg]
 ly.g <- ly[subg]
 uy.g <- uy[subg]
 panel.abline(h = unique(y.g), col = "grey")
 panel.arrows(ly.g, y.g, uy.g, y.g, col = 'black',
  length = 0.25, unit = "native",
  angle = 90, code = 3)
 panel.barchart(x.g, y.g, pch = pch, col = col[i], ...)
   }
}


all = barchart(
  Score ~ Methods | Score.Name * RNA.Type,
  data = benchmark,
  box.ratio = 1.2,
  xlab = 'Methods',
  ylab = 'Percentage',
  groups = Seq.Number,
  layout = c(2, 5), # 2 columns per row
  between = list( y = 0.5, x = 0 ),
#  par.settings = list(fontsize=list(text=8)),
  ## specify the colors used for bars
  par.settings = list(fontsize=list(text=8), superpose.polygon = list(border
= 'black', col = c('white', 'gray', 'black'))),
  par.strip.text = list(cex=0.9),
  auto.key = list(space = 'top', columns = 3, cex = 0.7),
#  key = key.variety,
#  index.cond = list(c('tRNA', '5S rRNA', 'SRP RNA', 'RNase P', '16S
rRNA')),
#  index.cond = list(rep(1,6)),
#  ylim = my.ylim,
  scales = list(x = list(rot = 45), y=list(tck = 0.4, rot = 0, relation =
'free')),
  ly = benchmark$Score - benchmark$Error,
  uy = benchmark$Score + benchmark$Error,
  prepanel = prepanel.ci,
  panel.groups = panel.ci
  )


 Deepayan's solution

prepanel.ci <- function(x, y, ly, uy, subscripts, ...) {
y <- as.numeric(y)
ly <- as.numeric(ly[subscripts])
uy <- as.numeric(uy[subscripts])
list(ylim = range(y, uy, ly, finite = TRUE))
}

panel.ci <- function(x, y, ly, uy, subscripts, ...) {
x <- as.numeric(x)
y <- as.numeric(y)
ly <- as.numeric(ly[subscripts])
uy <- as.numeric(uy[subscripts])
panel.barchart(x, y, ...)
panel.arrows(x, ly, x, uy, col = 'black',
 length = 0.1, unit = "native",
 angle = 90, code = 3)
}

all = barchart(
  Score ~ Methods | Score.Name * RNA.Type,
  data = benchmark,
  box.ratio = 1.2,
  xlab = 'Methods',
  ylab = 'Percentage',
  groups = Seq.Number,
  layout = c(2, 5), # 2 columns per row
  between = list( y = 0.5, x = 0 ),
#  par.settings = list(fontsize=list(text=8)),
  ## specify the colors used for bars
  par.settings = list(fontsize=list(text=8), superpose.polygon = list(border
= 'black', col = c('white', 'gray', 'black'))),
  par.strip.text = list(cex=0.9),
  auto.key = list(space = 'top', columns = 3, cex = 0.7),
#  key = key.variety,
#  index.cond = list(c('tRNA', '5S rRNA', 'SRP RNA', 'RNase P', '16S
rRNA')),
#  index.cond = list(rep(1,6)),
#  ylim = my.ylim,
  scales = list(x = list(rot = 45), y=list(tck = 0.4, rot = 0, relation =
'free')),
  ly = benchmark$Score - benchmark$Error,
  uy = benchmark$Score + benchmark$Error,
  prepanel = prepanel.ci,

  panel.groups = panel.ci,


  panel = panel.superpose  )



Sundar's solution gives me the exact same original plot without error
bars, and Deepayan's solution gives me a messy plot. Did I mess up
anything in these two solutions? I'd appreciate any help from you
experts. Thanks

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matrix problem

2010-08-10 Thread zhenjiang xu
Hi,

I have a file like this:
1 2 0.1
2 3 0.2
3 1 0.3

And I want to read it to create a matrix like this:
 [,1] [,2][,3]
[1,]0   0.1 0
[2,]0   00.2
[3,]0.300

How can I do it efficiently? Thanks.
-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to display the value of each data points on the levelplot

2010-05-05 Thread zhenjiang xu
Hi R users,

How can I display the corresponding value inside each little square of level
plot plotted by the following code?
> data(Cars93, package = "MASS")
> cor.Cars93 <- cor(Cars93[, !sapply(Cars93, is.factor)], use = "pair")
> levelplot(cor.Cars93, aspect = 1, scales = list(x = list(rot = 90)))

This is an example from the book "Lattice:Mutivariate Data Visualization
with R". I know there is an example (Fig 13.5) showing how to do levelplot
with data labels and ellipse shape. But here I want to keep the square
shape.
-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on autocorrelation acf

2010-04-30 Thread zhenjiang xu
Thanks, Duncan, but there are no reference in ?acf. The only probably
related stuff is

"Author(s):

 Original: Paul Gilbert, Martyn Plummer. Extensive modifications
 and univariate case of 'pacf' by B.D. Ripley."

And I didn't find anything with google search of it.


On Thu, Apr 29, 2010 at 7:08 PM, Duncan Murdoch wrote:

> On 29/04/2010 6:22 PM, zhenjiang xu wrote:
>
>> Hi R users,
>>
>> where can I find the equations used by acf function to calculate
>> autocorrelation?
>>
>
> See the reference listed in ?acf.
>
> Duncan Murdoch
>
>
>   I think I misunderstand acf. Doesn't acf use following
>> equation to calculate autocorrelation?
>> [image: R(\tau) = \frac{\operatorname{E}[(X_t - \mu)(X_{t+\tau} -
>> \mu)]}{\sigma^2}\, ,]
>> If it does, then the autocorrelation of a sine function should give a
>> cosine; however, the following code gives a cosine-shape function with its
>> magnitude decreasing along the lag.
>> x = c(1:500)
>> x = x/10
>> x = sin(x)
>> acf(x, type='correlation', lag.max=length(x)-1)
>>
>>
>>
>
>


-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a question on autocorrelation acf

2010-04-29 Thread zhenjiang xu
Hi R users,

where can I find the equations used by acf function to calculate
autocorrelation? I think I misunderstand acf. Doesn't acf use following
equation to calculate autocorrelation?
[image: R(\tau) = \frac{\operatorname{E}[(X_t - \mu)(X_{t+\tau} -
\mu)]}{\sigma^2}\, ,]
If it does, then the autocorrelation of a sine function should give a
cosine; however, the following code gives a cosine-shape function with its
magnitude decreasing along the lag.
x = c(1:500)
x = x/10
x = sin(x)
acf(x, type='correlation', lag.max=length(x)-1)

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to reorder of groups and specify ylim for each row in lattice barchart

2010-04-23 Thread zhenjiang xu
Yes. I put the real ranges instead of '...'. But I tried the following code
and it works. This is great! Thank you. Previously I thought you said ylim
was put inside the scales().

library(lattice)
barchart(yield ~ variety | site,data=barley, groups = year, layout =
c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
"right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
45), y=list(relation='free')), ylim = mylist)



On Fri, Apr 23, 2010 at 11:51 AM, Peter Ehlers  wrote:

> Works for me. Did you replace the '' in mylist()
> with appropriate c(,) code? For example:
>
> mylist <- list(c(0,30), c(40,80), c(0,50),
>   c(0,50), c(0,50), c(0,50))
>
>  -Peter Ehlers
>
>
> On 2010-04-23 9:22, zhenjiang xu wrote:
>
>> Peter, thanks, but that doesn't work. Did I missed something?
>>
>> library(lattice)
>> mylist<- list(c(0,30), c(40,80), )
>> barchart(yield ~ variety | site,data=barley, groups = year, layout =
>> c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
>> "right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
>> 45), y=list(relation='free', ylim=mylist)))
>>
>> On Thu, Apr 22, 2010 at 7:54 PM, Peter Ehlers  wrote:
>>
>>  On 2010-04-21 21:13, zhenjiang xu wrote:
>>>
>>>  R experts,
>>>>
>>>> Is there anyway to reorder inside each group? In the following example,
>>>> the
>>>> bar of year 1932 is always plotted before the bar of year 1931, may I
>>>> change
>>>> the order inside each groups of bars?
>>>>
>>>>
>>>>  Do you mean a different order in different panels? That seems to
>>> me to defeat the purpose of panels. I can't think of an easy way
>>> to do that.
>>>
>>>
>>>  library(lattice)
>>>
>>>> barchart(yield ~ variety | site,data=barley, groups = year, layout =
>>>> c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
>>>> "right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot
>>>> =
>>>> 45)))
>>>>
>>>>
>>>> Another questions is: may I specify the various ranges of y axis for
>>>> each
>>>> row of the panels? In the example of above, how can I change the y range
>>>> of
>>>> "Waseca" panel to (40,70) from the default (0,80)? Please notice I don't
>>>> want to set arguement "scales = list(y=list(relation='free'))", for the
>>>> automatic various setting of ranges for different panels isn't good
>>>> enough
>>>> for me. Basically I'd like to manually control y ranges.
>>>>
>>>>
>>> You can use scales() with
>>>  y=list(relation='free', ylim=mylist)
>>>
>>> where mylist is a list of ylims:
>>>  mylim<- list(c(0,30), c(40,80), )
>>>
>>>
>> Peter, thanks, but that doesn't work. Did I missed something?
>>
>> library(lattice)
>> mylist<- list(c(0,30), c(40,80), )
>> barchart(yield ~ variety | site,data=barley, groups = year, layout =
>> c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
>> "right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
>> 45), y=list(relation='free', ylim=mylist)))
>>
>>
>>
>>>  -Peter Ehlers
>>>
>>>
>>>  Thank you!
>>>>
>>>>
>>> --
>>> Peter Ehlers
>>> University of Calgary
>>>
>>>
>>
>>
>>
> --
> Peter Ehlers
> University of Calgary
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a question related to table output

2010-04-23 Thread zhenjiang xu
Hi,

I have a data.frame object:
> a.df
   Methods Score
1 Northern 1.3544227
2 Northern 0.8302436
3   RT-PCR 1.0011360
4   RT-PCR 1.1149423

If I write it out with write.table,
> write.table(a.df, file = 'data.txt', quote = FALSE, sep = '\t', row.names
= FALSE)

the data.txt is looks like:

MethodsScore
Northern1.35442268939541
Northern0.830243615689926
RT-PCR1.00113601434407
RT-PCR1.11494230904995

My question is, can I merge the two "Northern" entries into one cell, like
the "Merge Cells" in MS Excel ?

Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the bar width of barchart plot in lattice package

2010-04-23 Thread zhenjiang xu
probably yes. I plotted each row individually instead. Thanks

On Fri, Apr 23, 2010 at 11:14 AM, Deepayan Sarkar  wrote:

> On Wed, Apr 21, 2010 at 8:55 PM, David Winsemius 
> wrote:
> >
> > On Apr 21, 2010, at 9:51 PM, zhenjiang xu wrote:
> >
> >> I tried that. It seems the bar width is already maximized, although
> there
> >> is a lot of space between groups of bars. Thank you anyway.
> >
> > I apologize. It was reproducible code. I missed the "values" assignment.
> > There is also a box.width argument which does affect how the plot gets
> > drawn, but the effects do not appear salutory. It appears that the
> alignment
> > of the bars gets shifted relative to the labels. The barchart function
> > cannot seem to deal with the completity of the 2 * 5 factor crossed with
> a
> > c(3,3,4) factor. On the other hand that problem seems to be present in
> the
> > original plot as well. Maybe you should re-think the structure of the
> data?
>
> The problem is that levels is nested within factors:
>
> > xtabs(~levels + factors, a)
>   factors
> levels  Cycles MaxPairs Order
>  Cycle 1   100 0
>  Cycle 2   100 0
>  Cycle 3   100 0
>  Cycle 4   100 0
>  Order 10010
>  Order 20010
>  Order 30010
>  MaxPairs = 20  0   10 0
>  MaxPairs = Average Length  0   10 0
>  MaxPairs = 500 0   10 0
>
> I can't think of a meaningful design that would give the desired result
> here.
>
> -Deepayan
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to reorder of groups and specify ylim for each row in lattice barchart

2010-04-23 Thread zhenjiang xu
Peter, thanks, but that doesn't work. Did I missed something?

library(lattice)
mylist <- list(c(0,30), c(40,80), )
barchart(yield ~ variety | site,data=barley, groups = year, layout =
c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
"right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
45), y=list(relation='free', ylim=mylist)))

On Thu, Apr 22, 2010 at 7:54 PM, Peter Ehlers  wrote:

> On 2010-04-21 21:13, zhenjiang xu wrote:
>
>> R experts,
>>
>> Is there anyway to reorder inside each group? In the following example,
>> the
>> bar of year 1932 is always plotted before the bar of year 1931, may I
>> change
>> the order inside each groups of bars?
>>
>>
> Do you mean a different order in different panels? That seems to
> me to defeat the purpose of panels. I can't think of an easy way
> to do that.
>
>
>  library(lattice)
>> barchart(yield ~ variety | site,data=barley, groups = year, layout =
>> c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
>> "right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
>> 45)))
>>
>>
>> Another questions is: may I specify the various ranges of y axis for each
>> row of the panels? In the example of above, how can I change the y range
>> of
>> "Waseca" panel to (40,70) from the default (0,80)? Please notice I don't
>> want to set arguement "scales = list(y=list(relation='free'))", for the
>> automatic various setting of ranges for different panels isn't good enough
>> for me. Basically I'd like to manually control y ranges.
>>
>
> You can use scales() with
>  y=list(relation='free', ylim=mylist)
>
> where mylist is a list of ylims:
>  mylim <- list(c(0,30), c(40,80), )
>

Peter, thanks, but that doesn't work. Did I missed something?

library(lattice)
mylist <- list(c(0,30), c(40,80), )
barchart(yield ~ variety | site,data=barley, groups = year, layout =
c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
"right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
45), y=list(relation='free', ylim=mylist)))


>
>  -Peter Ehlers
>
>
>> Thank you!
>>
>
> --
> Peter Ehlers
> University of Calgary
>



-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to reorder of groups and specify ylim for each row in lattice barchart

2010-04-21 Thread zhenjiang xu
R experts,

Is there anyway to reorder inside each group? In the following example, the
bar of year 1932 is always plotted before the bar of year 1931, may I change
the order inside each groups of bars?

library(lattice)
barchart(yield ~ variety | site,data=barley, groups = year, layout =
c(1,6),auto.key = list(points = FALSE, rectangles = TRUE, space =
"right"),ylab = "Barley Yield (bushels/acre)",scales = list(x = list(rot =
45)))


Another questions is: may I specify the various ranges of y axis for each
row of the panels? In the example of above, how can I change the y range of
"Waseca" panel to (40,70) from the default (0,80)? Please notice I don't
want to set arguement "scales = list(y=list(relation='free'))", for the
automatic various setting of ranges for different panels isn't good enough
for me. Basically I'd like to manually control y ranges.

Thank you!
-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the bar width of barchart plot in lattice package

2010-04-21 Thread zhenjiang xu
I tried that. It seems the bar width is already maximized, although there is
a lot of space between groups of bars. Thank you anyway.

On Tue, Apr 20, 2010 at 10:16 AM, David Winsemius wrote:

>
> On Apr 20, 2010, at 9:46 AM, zhenjiang xu wrote:
>
>  Dear R users,
>>
>> I am trying to use the following code to make a barchar plot. The bars in
>> the plot turn out to be a little narrow. Is there any way to modify the
>> width of the bars? Thank you!
>>
>> library(lattice)
>> scores = gl(2, 5, label=c('Sensitivity', 'PPV'), length = 100)
>> sequences = gl(5, 1, label=c('Lemna minor', 'Dugesia japonica A',
>> 'Gymnosporangium sabinae', 'Hymeniacidon sanguinea', 'Streptomyces
>> griseus'), length = 100)
>> levels = gl(10, 10, label = c('Cycle 1', 'Cycle 2', 'Cycle 3', 'Cycle 4',
>> 'Order 1', 'Order 2', 'Order 3', 'MaxPairs = 20', 'MaxPairs = Average
>> Length', 'MaxPairs = 500'))
>> factors = c(rep('Cycles', 40), rep('Order', 30), rep('MaxPairs', 30))
>> values = rnorm(100) # this is toy data
>> a = data.frame(values, scores, sequences, levels, factors)
>> bc.factors =
>>  barchart(values ~ sequences | scores * factors , data = a,
>>  groups = levels,
>>  layout = c(2,3),
>>  between = list(y=0.5),
>>  clip = list(strip = 'off'),
>>  par.strip.text = list(cex=0.7),
>>  par.settings = list(fontsize=list(text=8)),
>>  auto.key = list(rectangles = TRUE, space = 'right', columns = 1),
>>  draw.key = TRUE,
>>  scales = list(x = list(rot = 45)))
>>
>>
> ?barchart
>
> Looking at the arguments to barchart in the help page I would have guessed
> that box.ratio would do what you want. Since that is clearly not
> reproducible code , (in the absence of test dataset of the appropriate
> structure) I suppose guessing will remain the level of my knowledge in this
> instance.
>
>
>  --
>> Best,
>> Zhenjiang
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>


-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the bar width of barchart plot in lattice package

2010-04-20 Thread zhenjiang xu
Dear R users,

I am trying to use the following code to make a barchar plot. The bars in
the plot turn out to be a little narrow. Is there any way to modify the
width of the bars? Thank you!

library(lattice)
scores = gl(2, 5, label=c('Sensitivity', 'PPV'), length = 100)
sequences = gl(5, 1, label=c('Lemna minor', 'Dugesia japonica A',
'Gymnosporangium sabinae', 'Hymeniacidon sanguinea', 'Streptomyces
griseus'), length = 100)
levels = gl(10, 10, label = c('Cycle 1', 'Cycle 2', 'Cycle 3', 'Cycle 4',
'Order 1', 'Order 2', 'Order 3', 'MaxPairs = 20', 'MaxPairs = Average
Length', 'MaxPairs = 500'))
factors = c(rep('Cycles', 40), rep('Order', 30), rep('MaxPairs', 30))
values = rnorm(100) # this is toy data
a = data.frame(values, scores, sequences, levels, factors)
bc.factors =
  barchart(values ~ sequences | scores * factors , data = a,
   groups = levels,
   layout = c(2,3),
   between = list(y=0.5),
   clip = list(strip = 'off'),
   par.strip.text = list(cex=0.7),
   par.settings = list(fontsize=list(text=8)),
   auto.key = list(rectangles = TRUE, space = 'right', columns = 1),
   draw.key = TRUE,
   scales = list(x = list(rot = 45)))

-- 
Best,
Zhenjiang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.