Re: [R] Question regarding sqldf

2009-07-31 Thread Rizzo . Michael
My sqldf package was already updated. The result from the example you
ran is also missing 2 of the five days so something still isn't working
correctly.




|
| From:  |
|
  
|
  |milton ruser milton.ru...@gmail.com
   |
  
|
|
| To:|
|
  
|
  |Michael Rizzo/RTP/USEPA/u...@epa 
 |
  
|
|
| Cc:|
|
  
|
  |r-help@r-project.org 
   |
  
|
|
| Date:  |
|
  
|
  |07/30/2009 05:56 PM  
   |
  
|
|
| Subject:   |
|
  
|
  |Re: [R] Question regarding sqldf 
   |
  
|





How about update R/sqldf?

 require(sqlf)
Loading required package: sqlf
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE,  :
  there is no package called 'sqlf'

 testframe-data.frame(sdate=rep(paste
+ (day,1:5,sep=),each=5),bext=runif(25,1,100),otherstuff=runif
+ (25,45,60),stuff=runif(25,3,25))


 maxdays-sqldf(select distinct * from testframe group by sdate having
bext=max(bext))
 maxdays
  sdate bext otherstuff stuff
1  day1 77.20252   51.77503 12.045835
2  day2 91.37352   50.07259  4.468679
3  day5 74.29047   58.58456 10.027444



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding sqldf

2009-07-31 Thread Gabor Grothendieck
Your sql statement is using a variable in the having clause
that is not in the group by clause.  Given that group by is supposed
to produce a single output per group we see that this construct is
problematic since what should happen in the case that there
are two equal maximum values in the same group?  Its possible
that some databases can support that anyways but I don't think
sqlite does.  Write your query like this:

  sqldf(select * from testframe a where a.bext =
+   (select max(bext) from testframe where a.sdate = sdate))
  sdate bext otherstuff stuff
1  day1 94.10626   47.20670 15.340856
2  day2 89.34949   55.36058 19.572773
3  day3 95.72650   52.16694 17.632534
4  day4 95.49586   47.28667 12.676297
5  day5 99.43271   48.98959  3.013745


On Thu, Jul 30, 2009 at 5:30 PM, rizzo.mich...@epamail.epa.gov wrote:
 Here is a test data frame:

 testframe-data.frame(sdate=rep(paste
 (day,1:5,sep=),each=5),bext=runif(25,1,100),otherstuff=runif
 (25,45,60),stuff=runif(25,3,25))
 testframe
   sdate      bext             otherstuff     stuff
 1   day1 37.863859   49.19816 10.036211
 2   day1 58.557049   59.23145 21.793954
 3   day1 70.345462   48.82313 24.630742
 4   day1  1.245913   57.70718 14.131270
 5   day1 40.170634   50.38963 21.420328
 6   day2 36.126846   51.97043 22.815169
 7   day2 95.801896   53.30061 13.540535
 8   day2 19.959486   50.46775  9.403084
 9   day2  5.772996   55.14082 17.137113
 10  day2 77.956966   51.44969 13.448527
 11  day3  6.743810   54.49836 17.650795
 12  day3 60.758896   55.35407  5.005359
 13  day3 86.924873   57.74122  8.691572
 14  day3 91.933544   48.17538  8.559737
 15  day3 23.627126   59.28906  4.766172
 16  day4 76.200158   55.78072 19.493428
 17  day4 67.809049   51.06784  7.116428
 18  day4 70.309643   54.38067 10.736461
 19  day4 91.152166   58.39768 23.199943
 20  day4 30.358417   54.16645 24.153823
 21  day5 64.064953   58.31813 19.856882
 22  day5 49.220106   48.49721 16.615940
 23  day5 21.554696   47.03421  3.585458
 24  day5 30.227874   55.51724 14.435731
 25  day5 51.518716   54.31445 20.837326

 If I run the following sqldf statement:

  maxdays-sqldf(select distinct * from testframe group by sdate having
 bext=max(bext))

 I get:

 maxdays
 data frame with 0 columns and 0 rows

 I want to get the entire observation for each day where bext is the
 maximum for the day.



 |
 | From:      |
 |
  |
  |milton ruser milton.ru...@gmail.com                                       
                                                                 |
  |
 |
 | To:        |
 |
  |
  |Michael Rizzo/RTP/USEPA/u...@epa                                            
                                                                   |
  |
 |
 | Cc:        |
 |
  |
  |r-help@r-project.org                                                        
                                                                 |
  |
 |
 | Date:      |
 |
  |
  |07/30/2009 03:03 PM                                                         
                                                                 |
  |
 |
 | Subject:   |
 |
  |
  |Re: [R] Question regarding sqldf                                            
                                                                 |
  |





 Hi Rizzi,

 how about a reproducible example/data.frame?

 :-)
 milton

 On Thu, Jul 30, 2009 at 1:46 PM, rizzo.mich...@epamail.epa.gov wrote:

  Hello,

  I am having a problem using sqldf.  I'm trying to choose

Re: [R] Question regarding sqldf

2009-07-30 Thread milton ruser
Hi Rizzi,

how about a reproducible example/data.frame?

:-)
milton

On Thu, Jul 30, 2009 at 1:46 PM, rizzo.mich...@epamail.epa.gov wrote:


 Hello,

 I am having a problem using sqldf.  I'm trying to choose a subset of
 observations from a data set based on the date and maximum value of a
 variable by date.

 Here is the code I am using:

 test-sqldf(select distinct * from bextuse group by sdate having
 bext=max(bext),method=raw);

 The result I get back is a data frame with 0 rows and 0 columns.  I have
 tried the code in another program that utilizes SQL, and I retrieve the
 14 rows I was expecting to get.
 I looked at the SQLite information on the web, and it mentions that the
 having clause can be used to select observations from grouped data
 using aggregating functions.

 Any help or advice is greatly appreciated.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding sqldf

2009-07-30 Thread Rizzo . Michael
Here is a test data frame:

 testframe-data.frame(sdate=rep(paste
(day,1:5,sep=),each=5),bext=runif(25,1,100),otherstuff=runif
(25,45,60),stuff=runif(25,3,25))
 testframe
   sdate  bext otherstuff stuff
1   day1 37.863859   49.19816 10.036211
2   day1 58.557049   59.23145 21.793954
3   day1 70.345462   48.82313 24.630742
4   day1  1.245913   57.70718 14.131270
5   day1 40.170634   50.38963 21.420328
6   day2 36.126846   51.97043 22.815169
7   day2 95.801896   53.30061 13.540535
8   day2 19.959486   50.46775  9.403084
9   day2  5.772996   55.14082 17.137113
10  day2 77.956966   51.44969 13.448527
11  day3  6.743810   54.49836 17.650795
12  day3 60.758896   55.35407  5.005359
13  day3 86.924873   57.74122  8.691572
14  day3 91.933544   48.17538  8.559737
15  day3 23.627126   59.28906  4.766172
16  day4 76.200158   55.78072 19.493428
17  day4 67.809049   51.06784  7.116428
18  day4 70.309643   54.38067 10.736461
19  day4 91.152166   58.39768 23.199943
20  day4 30.358417   54.16645 24.153823
21  day5 64.064953   58.31813 19.856882
22  day5 49.220106   48.49721 16.615940
23  day5 21.554696   47.03421  3.585458
24  day5 30.227874   55.51724 14.435731
25  day5 51.518716   54.31445 20.837326

If I run the following sqldf statement:

 maxdays-sqldf(select distinct * from testframe group by sdate having
bext=max(bext))

I get:

 maxdays
data frame with 0 columns and 0 rows

I want to get the entire observation for each day where bext is the
maximum for the day.



|
| From:  |
|
  
|
  |milton ruser milton.ru...@gmail.com
   |
  
|
|
| To:|
|
  
|
  |Michael Rizzo/RTP/USEPA/u...@epa 
 |
  
|
|
| Cc:|
|
  
|
  |r-help@r-project.org 
   |
  
|
|
| Date:  |
|
  
|
  |07/30/2009 03:03 PM  
   |
  
|
|
| Subject:   |
|
  
|
  |Re: [R] Question regarding sqldf 
   |
  
|





Hi Rizzi,

how about a reproducible example/data.frame?

:-)
milton

On Thu, Jul 30, 2009 at 1:46 PM, rizzo.mich...@epamail.epa.gov wrote:

  Hello,

  I am having a problem using sqldf.  I'm trying to choose a subset of
  observations from a data set based on the date and maximum value of a
  variable by date.

  Here is the code I am using:

  test-sqldf(select distinct * from bextuse group by sdate having
  bext=max(bext),method=raw);

  The result I get back is a data frame with 0 rows and 0 columns.  I
  have
  tried the code in another program that utilizes SQL, and I retrieve
  the
  14 rows I was expecting to get.
  I looked at the SQLite information on the web, and it mentions that
  the
  having clause can be used to select observations from grouped data
  using aggregating functions.

  Any help or advice is greatly appreciated.

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal

Re: [R] Question regarding sqldf

2009-07-30 Thread milton ruser
How about update R/sqldf?

 require(sqlf)
Loading required package: sqlf
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return
= TRUE,  :
  there is no package called 'sqlf'

 testframe-data.frame(sdate=rep(paste
+ (day,1:5,sep=),each=5),bext=runif(25,1,100),otherstuff=runif
+ (25,45,60),stuff=runif(25,3,25))


 maxdays-sqldf(select distinct * from testframe group by sdate having
bext=max(bext))
 maxdays
  sdate bext otherstuff stuff
1  day1 77.20252   51.77503 12.045835
2  day2 91.37352   50.07259  4.468679
3  day5 74.29047   58.58456 10.027444

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding sqldf

2009-07-30 Thread Gabor Grothendieck
I just answered essentially the same question from another person offline
within the last day.  Anyways:

Date class variables are stored as numbers, i.e. days since
the Epoch, in R and when transferred to sqlite get transferred
as numbers.  Try this:

 library(sqldf)
 DF - data.frame(a = Sys.Date() + 1:5, b = 1:5)
 DF
  a b
1 2009-07-31 1
2 2009-08-01 2
3 2009-08-02 3
4 2009-08-03 4
5 2009-08-04 5
 Sys.Date() + 2
[1] 2009-08-01
 s - sprintf(select * from DF where a = %d, Sys.Date() + 2)
 s
[1] select * from DF where a = 14457
 sqldf(s)
  a b
1 2009-08-01 2
2 2009-08-02 3
3 2009-08-03 4
4 2009-08-04 5

 # to compare against character string store a as character
 DF2 - transform(DF, a = as.character(a))
 sqldf(select * from DF2 where a = '2009-08-01')
  a b
1 2009-08-01 2
2 2009-08-02 3
3 2009-08-03 4
4 2009-08-04 5

On Thu, Jul 30, 2009 at 1:46 PM, rizzo.mich...@epamail.epa.gov wrote:

 Hello,

 I am having a problem using sqldf.  I'm trying to choose a subset of
 observations from a data set based on the date and maximum value of a
 variable by date.

 Here is the code I am using:

 test-sqldf(select distinct * from bextuse group by sdate having
 bext=max(bext),method=raw);

 The result I get back is a data frame with 0 rows and 0 columns.  I have
 tried the code in another program that utilizes SQL, and I retrieve the
 14 rows I was expecting to get.
 I looked at the SQLite information on the web, and it mentions that the
 having clause can be used to select observations from grouped data
 using aggregating functions.

 Any help or advice is greatly appreciated.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.