Re: [GENERAL] histogram

2011-05-01 Thread Thomas Markus

hi,

group by 1 means group by first output column
order by 2 means order by second output column ascending



Am 30.04.2011 19:00, schrieb Joel Reymont:

What is the meaning of

group by 1 order by 2

e.g. what to the numbers 1 and 2 stand for?

What would change if I do the following?

group by 1 order by 1

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:


Hi,

try something like this:

select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,200)
group by 1 order by 2

--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread Rob Sargent



David Johnston wrote:

Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious.  When generating calculated fields it is
cleaner than the alternative:

Select trunc(distance * 10.)/10., count(*)
From doc_ads
Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))

It would be nice if you could do:

Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
From doc_ads
Group by bin
Order by bin

But I do not believe that is allowed (though I may have my syntax wrong...)

David J.

  

re: 1 and 2. They're horrible (imho) reference to the attributes of the
  

returned tuple. Or at best an exposure of the implementation. :)

  

Joel Reymont wrote:
  

I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

  Thanks, Joel



  

I think we're supposed to bottom-post here.

I agree in the case of generated columns and old servers but you see the 
practice more commonly than really necessary. But in 8.4 at least


select trunc(distance * 10.0 )/10.0 as histo, count(*) as tally
from d group by histo order by tally;


works just fine for me

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread David Johnston
Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious.  When generating calculated fields it is
cleaner than the alternative:

Select trunc(distance * 10.)/10., count(*)
>From doc_ads
Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))

It would be nice if you could do:

Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
>From doc_ads
Group by bin
Order by bin

But I do not believe that is allowed (though I may have my syntax wrong...)

David J.

>> re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)

>>Joel Reymont wrote:
>>> I think this should do what I want
>>>
>>> select trunc(distance * 10.)/10., count(*)
>>> from doc_ads
>>> group by 1 order by 1
>>>
>>>   Thanks, Joel



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread Rob Sargent
re: 1 and 2. They're horrible (imho) reference to the attributes of the 
returned tuple. Or at best an exposure of the implementation. :)


Order by "2" if you want the most frequent (highest counts) of your 
distances at the bottom of the output (or ordery by 2 desc) if you want 
them at the top of your output.


Joel Reymont wrote:

I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

  Thanks, Joel


--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---




  


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread Joel Reymont
I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

  Thanks, Joel


--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread Joel Reymont
What is the meaning of 

group by 1 order by 2

e.g. what to the numbers 1 and 2 stand for?

What would change if I do the following?

group by 1 order by 1

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

> Hi,
> 
> try something like this:
> 
> select
>trunc(random() * 10.)/10.
>, count(*)
> from
>generate_series(1,200)
> group by 1 order by 2

--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread Joel Reymont
Thank you Thomas!

Is there a way for the code below to determine the number of rows in the table 
and use it?

Thanks, Joel

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

> Hi,
> 
> try something like this:
> 
> select
>trunc(random() * 10.)/10.
>, count(*)
> from
>generate_series(1,200)
> group by 1 order by 2

--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] histogram

2011-04-30 Thread Thomas Markus

Hi,

try something like this:

select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,200)
group by 1 order by 2

regards
Thomas

Am 30.04.2011 18:37, schrieb Joel Reymont:

I have a column of 2 million float values from 0 to 1.

I would like to figure out how many values fit into buckets spaced by 0.10, 
e.g. from 0 to 0.10, from 0.10 to 0.20, etc.

What is the best way to do this?

Thanks, Joel

--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] histogram

2011-04-30 Thread Joel Reymont
I have a column of 2 million float values from 0 to 1.

I would like to figure out how many values fit into buckets spaced by 0.10, 
e.g. from 0 to 0.10, from 0.10 to 0.20, etc.

What is the best way to do this?

Thanks, Joel

--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Histogram generator

2010-07-28 Thread Sam Mason
On Tue, Jul 27, 2010 at 09:25:05PM -0400, Patrick May wrote:
> On Jul 27, 2010, at 9:21 PM, Steve Atkins wrote:
> > select date_trunc('hour', foo) + interval '30 minutes' * 
> > floor(extract(minute from foo) / 30) as start, event, count(*) from bar 
> > group by 1, 2 order by 1 asc;
> 
>   Thanks!  It looks like interval is what I need to play with.

Another useful tool to use is the classic unix "seconds since epoch".
You could turn the key expression from above into:

  timestamp 'epoch' + interval '1 second' * floor(date_part('epoch',foo) / 
(30*60)) * (30*60)

I'd probably go with Steve's version here, it's a bit more obvious
what's going on.  Also note, that if you don't really care about what
the specific groups are, just that you have a set of evenly divided
30minute periods you don't need to convert back to a date, so could just
use:

  floor(date_part('epoch',foo) / (30*60))

One final note, if you're dealing with lots of data and the above
expression is slow, you could delay converting back to a date until
"after" the grouping, i.e:

  SELECT timestamp 'epoch' + interval '1 second' * floor(date_part('epoch',foo) 
/ (30*60)) * (30*60) AS t, COUNT(*)
  FROM data
  GROUP BY floor(date_part('epoch',foo) / (30*60));

This will save PG from converting back to a date for every row when it's
going to chuck most of them away anyway.

Hope that gives you some more ideas!

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Histogram generator

2010-07-27 Thread Patrick May
On Jul 27, 2010, at 9:21 PM, Steve Atkins wrote:
> On Jul 27, 2010, at 6:07 PM, Patrick May wrote:
>> On Jul 27, 2010, at 7:12 PM, Dann Corbit wrote:
I have a table containing events with timestamps.  I would like
 to generate a histogram of the number of each type of event for each
 half-hour period from 8:00 am to 6:00 pm.  Are there any tools that
 will help me do this?
>>> 
>>> GROUP BY is essentially a histogram generator.
>>> The age() function will give you an interval.
>>> 
>>> I am not sure if you want to combine half hour periods from different days 
>>> with the same time or not (IOW are 8AM Tuesday and 8AM Wednesday supposed 
>>> to be grouped together or not?)
>>> 
>>> I guess that if you are more specific about exactly what you want you can 
>>> get a better answer.  Your requirement is a little bit vague or ambiguous.
>> 
>>  I don't mind using Excel to generate the actual graphic.  Ideally I'd 
>> get output something like this:
>> 
>>  datestart  endevent count
>> --    -
>> 2010-07-27 08:00:00 08:29:59  EVENT_120
>> 2010-07-27 08:00:00 08:29:59  EVENT_215
>> 2010-07-27 08:30:00 08:59:59  EVENT_110
>> 2010-07-27 08:30:00 08:59:59  EVENT_3 5
>> 
>> I'm quite familiar with SQL, but I'm not sure how to generate the half hour 
>> intervals without hard coding them.
> 
> There's probably a better way, but something like this might work:
> 
> select date_trunc('hour', foo) + interval '30 minutes' * floor(extract(minute 
> from foo) / 30) as start, event, count(*) from bar group by 1, 2 order by 1 
> asc;

Thanks!  It looks like interval is what I need to play with.

Regards,

Patrick



smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] Histogram generator

2010-07-27 Thread Steve Atkins

On Jul 27, 2010, at 6:07 PM, Patrick May wrote:

> On Jul 27, 2010, at 7:12 PM, Dann Corbit wrote:
>>> I have a table containing events with timestamps.  I would like
>>> to generate a histogram of the number of each type of event for each
>>> half-hour period from 8:00 am to 6:00 pm.  Are there any tools that
>>> will help me do this?
>> 
>> GROUP BY is essentially a histogram generator.
>> The age() function will give you an interval.
>> 
>> I am not sure if you want to combine half hour periods from different days 
>> with the same time or not (IOW are 8AM Tuesday and 8AM Wednesday supposed to 
>> be grouped together or not?)
>> 
>> I guess that if you are more specific about exactly what you want you can 
>> get a better answer.  Your requirement is a little bit vague or ambiguous.
> 
>   I don't mind using Excel to generate the actual graphic.  Ideally I'd 
> get output something like this:
> 
>   datestart  endevent count
> --    -
> 2010-07-27 08:00:00 08:29:59  EVENT_120
> 2010-07-27 08:00:00 08:29:59  EVENT_215
> 2010-07-27 08:30:00 08:59:59  EVENT_110
> 2010-07-27 08:30:00 08:59:59  EVENT_3 5
> 
> I'm quite familiar with SQL, but I'm not sure how to generate the half hour 
> intervals without hard coding them.

There's probably a better way, but something like this might work:

select date_trunc('hour', foo) + interval '30 minutes' * floor(extract(minute 
from foo) / 30) as start, event, count(*) from bar group by 1, 2 order by 1 asc;

Cheers,
  Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Histogram generator

2010-07-27 Thread Patrick May
On Jul 27, 2010, at 7:12 PM, Dann Corbit wrote:
>>  I have a table containing events with timestamps.  I would like
>> to generate a histogram of the number of each type of event for each
>> half-hour period from 8:00 am to 6:00 pm.  Are there any tools that
>> will help me do this?
> 
> GROUP BY is essentially a histogram generator.
> The age() function will give you an interval.
> 
> I am not sure if you want to combine half hour periods from different days 
> with the same time or not (IOW are 8AM Tuesday and 8AM Wednesday supposed to 
> be grouped together or not?)
> 
> I guess that if you are more specific about exactly what you want you can get 
> a better answer.  Your requirement is a little bit vague or ambiguous.

I don't mind using Excel to generate the actual graphic.  Ideally I'd 
get output something like this:

  datestart  endevent count
--    -
2010-07-27 08:00:00 08:29:59  EVENT_120
2010-07-27 08:00:00 08:29:59  EVENT_215
2010-07-27 08:30:00 08:59:59  EVENT_110
2010-07-27 08:30:00 08:59:59  EVENT_3 5

I'm quite familiar with SQL, but I'm not sure how to generate the half hour 
intervals without hard coding them.

Thanks,

Patrick



smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] Histogram generator

2010-07-27 Thread Dann Corbit
> -Original Message-
> From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] On Behalf Of Patrick May
> Sent: Tuesday, July 27, 2010 3:59 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Histogram generator
> 
> Hi,
> 
>   I have a table containing events with timestamps.  I would like
> to generate a histogram of the number of each type of event for each
> half-hour period from 8:00 am to 6:00 pm.  Are there any tools that
> will help me do this?

GROUP BY is essentially a histogram generator.
The age() function will give you an interval.

I am not sure if you want to combine half hour periods from different days with 
the same time or not (IOW are 8AM Tuesday and 8AM Wednesday supposed to be 
grouped together or not?)

I guess that if you are more specific about exactly what you want you can get a 
better answer.  Your requirement is a little bit vague or ambiguous.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Histogram generator

2010-07-27 Thread Patrick May
Hi,

I have a table containing events with timestamps.  I would like to 
generate a histogram of the number of each type of event for each half-hour 
period from 8:00 am to 6:00 pm.  Are there any tools that will help me do this?

Thanks,

Patrick



smime.p7s
Description: S/MIME cryptographic signature