Re: hadoop results

2011-06-30 Thread William Oberman
I think I'll do the former, thanks!

On Wed, Jun 29, 2011 at 11:16 PM, aaron morton wrote:

> How about  get_slice() with reversed == true and count = 1 to get the
> highest time UUID ?
>
> Or you can also store a column with a magic name that have the value of the
> timeuuid that is the current metric to use.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30 Jun 2011, at 06:35, William Oberman wrote:
>
> > I'll start with my question: given a CF with comparator TimeUUIDType,
> what is the most efficient way to get the greatest column's value?
> >
> > Context: I've been running cassandra for a couple of months now, so
> obviously it's time to start layering more on top :-)  In my test
> environment, I managed to get pig/hadoop running, and developed a few
> scripts to collect metrics I've been missing since I switched from MySQL to
> cassandra (including the ever useful "select count(*) from table"
> equivalent).
> >
> > I was hoping to dump the results of this processing back into cassandra
> for use in other tools/processes.  My initial thought was: new CF called
> "stats" with comparator TimeUUIDType.  The basic idea being I'd store:
> > stat_name -> time stat was computed (as UUID) -> value
> > That way I can also see a historical perspective of any given stat for
> auditing (and for cumulative stats to see trends).  The stat_name itself is
> a URI that is composed of "what" and any constraints on the "what"
> (including an optional time range, if the stat supports it).  E.g.
> ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still
> deciding on the format of the URI).  But, right now, the only way I know to
> get the "current" stat value would be to iterate over all columns (the
> TimeUUIDs) and then return the last one.
> >
> > Thanks for any tips,
> >
> > will
>
>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com


Re: hadoop results

2011-06-29 Thread aaron morton
How about  get_slice() with reversed == true and count = 1 to get the highest 
time UUID ? 

Or you can also store a column with a magic name that have the value of the 
timeuuid that is the current metric to use. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 06:35, William Oberman wrote:

> I'll start with my question: given a CF with comparator TimeUUIDType, what is 
> the most efficient way to get the greatest column's value?
> 
> Context: I've been running cassandra for a couple of months now, so obviously 
> it's time to start layering more on top :-)  In my test environment, I 
> managed to get pig/hadoop running, and developed a few scripts to collect 
> metrics I've been missing since I switched from MySQL to cassandra (including 
> the ever useful "select count(*) from table" equivalent).  
> 
> I was hoping to dump the results of this processing back into cassandra for 
> use in other tools/processes.  My initial thought was: new CF called "stats" 
> with comparator TimeUUIDType.  The basic idea being I'd store:
> stat_name -> time stat was computed (as UUID) -> value
> That way I can also see a historical perspective of any given stat for 
> auditing (and for cumulative stats to see trends).  The stat_name itself is a 
> URI that is composed of "what" and any constraints on the "what" (including 
> an optional time range, if the stat supports it).  E.g. 
> ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still 
> deciding on the format of the URI).  But, right now, the only way I know to 
> get the "current" stat value would be to iterate over all columns (the 
> TimeUUIDs) and then return the last one.
> 
> Thanks for any tips,
> 
> will



hadoop results

2011-06-29 Thread William Oberman
I'll start with my question: given a CF with comparator TimeUUIDType, what
is the most efficient way to get the greatest column's value?

Context: I've been running cassandra for a couple of months now, so
obviously it's time to start layering more on top :-)  In my test
environment, I managed to get pig/hadoop running, and developed a few
scripts to collect metrics I've been missing since I switched from MySQL to
cassandra (including the ever useful "select count(*) from table"
equivalent).

I was hoping to dump the results of this processing back into cassandra for
use in other tools/processes.  My initial thought was: new CF called "stats"
with comparator TimeUUIDType.  The basic idea being I'd store:
stat_name -> time stat was computed (as UUID) -> value
That way I can also see a historical perspective of any given stat for
auditing (and for cumulative stats to see trends).  The stat_name itself is
a URI that is composed of "what" and any constraints on the "what"
(including an optional time range, if the stat supports it).  E.g.
ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still
deciding on the format of the URI).  But, right now, the only way I know to
get the "current" stat value would be to iterate over all columns (the
TimeUUIDs) and then return the last one.

Thanks for any tips,

will