Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 10:14 AM, Jonathan Ellis  wrote:

> On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad 
> wrote:
> > So why is it again that the value field in the Column cannot be null if
> it
> > is not the
> > value field in the map, but just a part of the value field?
>
> Because without a compelling reason to allow nulls, the best policy is
> not to do so.
>
> This for me is about memory usage, I guess, so I was just curious if there
was a good reason
for using more than needed and I guess best policy is a reason for that.

> All of this makes total sense, I'm wondering about use cases where you
> want
> > to
> > get an empty row when you don't know if it has been deleted or not.
>
> If you're saying, "I understand that doing X would be Really
> Inefficient, but I want you to do it anyway because of some use case
> that nobody actually needs so far," then I think you have your answer.
>
> If that is not what you are asking then you'll need to give me a
> concrete example because I don't understand the question.
>
> Well, I cannot say that I understand all of this, since I'm not getting it
:)
But for me when you do a range query you want to know what data that you
have to work
with in those rows and usually not too interested in the empty ones. And the
reason for not
returning empty ones would be to save IO.


> -Jonathan
>



-- 
Regards Erik


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:30 AM, Jonathan Ellis  wrote:

> On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad 
> wrote:
> > I was probably a little bit unclear here. I'm wondering about the two
> byte[]
> > in Column.
> > One for name and one for value. I was under the impression that the
> > skiplistmap
> > wraps the Columns, not that the name and the value are themselves
> inserted
> > into a map?
>
> The column name is the key in one such map, yes.
>
So why is it again that the value field in the Column cannot be null if it
is not the
value field in the map, but just a part of the value field?

>
> >> > is it really that expensive to check if the list is empty before
> >> > returning
> >> > that row
> >>
> >> Yes, because you have to check the entire row, which may be much
> >> larger than the given predicate.
> >
> > That makes sense, but why would you be interested in the rows present
> > outside
> > your specified predicate?
>
> Because get_range_slice says, "apply this predicate to the range of
> rows given," meaning, if the predicate result is empty, we have to
> include an empty result for that row key.  It is perfectly valid to
> perform such a query returning empty column lists for some or all
> keys, even if no deletions have been performed.  So to special case
> leaving out result entries for deletions, we have to check the entire
> rest of the row to make sure there is no undeleted data anywhere else
> either (in which case leaving the key out would be an error).
>
All of this makes total sense, I'm wondering about use cases where you want
to
get an empty row when you don't know if it has been deleted or not.


-- 
Regards Erik


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:10 AM, Jonathan Ellis  wrote:

> On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad 
> wrote:
> > Why is it that null column values are not allowed?
>
> It's semantically unnecessary and potentially harmful at an
> implementation level.  (Many java Map implementations can't
> distinguish between a null key and a key that is not present.)
>
I was probably a little bit unclear here. I'm wondering about the two byte[]
in Column.
One for name and one for value. I was under the impression that the
skiplistmap
wraps the Columns, not that the name and the value are themselves inserted
into a map?


>
> > What is the reason for using a ConcurrentSkipListMap for
> > columns_ in ColumnFamily
> > compared to using the set version and use the comparator to sort on the
> name
> > field in IColumn?
>
> ?
>
> > For the call get_range_slice() you get all the rows returned even though
> > they might have been deleted,
>
> Yes, that is the point.
>
> > is it really that expensive to check if the list is empty before
> returning
> > that row
>
> Yes, because you have to check the entire row, which may be much
> larger than the given predicate.
>
That makes sense, but why would you be interested in the rows present
outside
your specified predicate?

>
> -Jonathan
>



-- 
Regards Erik


Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
Hey!
Been looking at the src and have a couple of questions:

Why is it that null column values are not allowed?

What is the reason for using a ConcurrentSkipListMap for
columns_ in ColumnFamily
compared to using the set version and use the comparator to sort on the name
field in IColumn?

For the call get_range_slice() you get all the rows returned even though
they might have been deleted,
is it really that expensive to check if the list is empty before returning
that row, or are there some other
places where this gets complicated?

-- 
Regards Erik


Re: ColumnFamilies vs composite rows in one table.

2010-03-06 Thread Erik Holstad
Thanks David and Jonathan!

@David
Yes rows doesn't have a name, I'm just using the word name for anything,
like cluster name,
table name, row name etc, that is my bad.

Yes, I did change two things, that was probably stupid, but the reason for
the second change
is space efficiency.

You are totally right that I'm choosing between scalability and performance
with the different
structures. What I really want to do is to just store indices in rows with a
composite key and
do range queries. Jonathan has firmly steered me away from this approach for
now in regards
to performance.

Thanks a lot!
Erik


ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Erik Holstad
What are the benefits of using multiple ColumnFamilies compared to using a
composite
row name?

Example: You have messages that you want to index on sent and to.

So you can either have
ColumnFamilyFrom:userTo:{userFrom->messageid}
ColumnFamilyTo:userFrom:{userTo->messageid}

or something like
ColumnFamily:user_to:{user1_messageId, user2_messageId}
ColumnFamily:user_from:{user1_messageId, user2_messageId}

One thing that I can see the advantage of using families are if you want to
use different types in the families. But are there others? Like storage
space,
read/write speeds etc.

-- 
Regards Erik


Re: Storage format

2010-03-02 Thread Erik Holstad
Thank you!


Re: Storage format

2010-03-01 Thread Erik Holstad
On Mon, Mar 1, 2010 at 2:51 PM, Jonathan Ellis  wrote:

> On Mon, Mar 1, 2010 at 4:49 PM, Erik Holstad 
> wrote:
> > Haha!
> > Thanks. Well I'm z little bit worried about this but since the indexes
> are
> > pretty
> > small I don't think it is going to be too bad. But was mostly thinking
> about
> > performance and and having the index row as a bottleneck for writing,
> since
> > the
> > partition is per row.
>
> Writing N columns to 1 row is faster than writing 1 column to N rows,
> even when all N are coming from different clients.  Our concurrency
> story there is excellent.
>
That sounds good, and the same thing goes for reading, cause that is
basically
what I'm looking for, faster reads, not too worried about the writes.

Thanks a lot!


Re: Storage format

2010-03-01 Thread Erik Holstad
Haha!
Thanks. Well I'm z little bit worried about this but since the indexes are
pretty
small I don't think it is going to be too bad. But was mostly thinking about
performance and and having the index row as a bottleneck for writing, since
the
partition is per row.


-- 
Regards Erik


Re: Is Cassandra a document based DB?

2010-03-01 Thread Erik Holstad
Yes, Cassandra has supercolumns and HBase versions and you are probably
correct that supercolumns
 are more used than versions, but I don't really think you can compare them
since versions are not a
serialized structure.

The reason that I didn't include table and family in the mapping is as I've
understood it a sql table can be
compared to the family in C and that multiple Keyspaces that would then map
to your table is not really
frequently used. Whereas multiple tables in HBase is almost always the rule.


Not sure that I would agree that a virtual dimension can compare to a real
one, but that is just the way
I see it.


-- 
Regards Erik


Re: Storage format

2010-03-01 Thread Erik Holstad
So that is kinda of what I want to do, but I want to go from
a row with multiple columns to multiple rows with one column,
maybe I'm not hearing you here and you are trying to tell me that
the columns, not supercolumns, are not stored together in a row structure?


-- 
Regards Erik


Re: Is Cassandra a document based DB?

2010-03-01 Thread Erik Holstad
On Mon, Mar 1, 2010 at 4:41 AM, Brandon Williams  wrote:

> On Mon, Mar 1, 2010 at 5:34 AM, HHB  wrote:
>
>>
>> What are the advantages/disadvantages of Cassandra over HBase?
>>
>
> Ease of setup: all nodes are the same.
>
> No single point of failure: all nodes are the same.
>
> Speed: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
>
> Richer model: supercolumns.
>
I think that there are people that would be of a different opinion here.
Cassandra has
as I've understood it table:key:name:val and in cases the val is a
serialized data structure.
In HBase you have table:row:family:key:val:version, which some people might
consider
richer.

>
> Multi-datacenter awareness.
>
> There are likely other things I'm forgetting, but those stand out for me.
>
> -Brandon
>



-- 
Regards Erik


Re: Storage format

2010-03-01 Thread Erik Holstad
Sorry about that!
Continuing:

And in that case when using rows as indexes instead of columns we only need
to read
that specific row and might be more efficient in that case than to read a
big row every
time?

-- 
Regards Erik


Re: Storage format

2010-03-01 Thread Erik Holstad
Thanks Jonathan!
So let's see if I got this right.
Just like an overview data being stored like HashMap>
and in case of a superColumnFamily HashMap>>?

Does that mean that when using columns as indexes we need to deserialize
So when asking for a column in  a row, the whole row structure first need to
be deserialized
and then we can get the columns we are looking for?

And in that case when using rows as indexes instead of columns we only need
to read

On Mon, Mar 1, 2010 at 11:24 AM, Jonathan Ellis  wrote:

> On Mon, Mar 1, 2010 at 12:50 PM, Erik Holstad 
> wrote:
> > I've been looking at the source, but not quite find the things I'm
> looking
> > for, so I have a few
> > questions.
> > Are columns for a row stored in a serialized data structure on disk or
> > stored individually and
> > put into a data structure when the call is being made?
>
> The former, but only for top-level columns -- subcolumns are all read
> at once for slices against supercolumns.
> (http://issues.apache.org/jira/browse/CASSANDRA-598)
>



-- 
Regards Erik


Storage format

2010-03-01 Thread Erik Holstad
I've been looking at the source, but not quite find the things I'm looking
for, so I have a few
questions.
Are columns for a row stored in a serialized data structure on disk or
stored individually and
put into a data structure when the call is being made? Because of the slice
query, does that
mean that all columns have to be read in before any are being sent back?
If that is the case, could it be more efficient to  use rows instead of
columns for storing for
example indexes and you just want to get a few at a time?

-- 
Regards Erik


Deleted rows showing up when doing a get_range_slice query

2010-02-24 Thread Erik Holstad
When deleting rows from a table and then using a get_range_slice query, the
keys or the
deleted rows show up, with no name/value pairs. What is the reasoning behind
this?

I have also seen a weird issue when using a md5 generated byte[] as a column
name,
doesn't seem like it actually work. I can't get the value that was inserted
that way.
But if I for example Base64.encode().getBytes() it seems to be ok, any
ideas?
-- 
Regards Erik


Re: Getting the keys in your system?

2010-02-24 Thread Erik Holstad
Haha!
Yeah, fortunately we are only in the testing phase so this is not that big
of a deal.
Thanks a lot!

-- 
Regards Erik


Re: Getting the keys in your system?

2010-02-24 Thread Erik Holstad
Thanks Jonathan!
We are thinking about moving over to the OPP to be able to be able to do
this
and to use an md5 for some of  the data just to get the data written to
different nodes
for some of the cases where order is not really needed. Is there anything we
need to
think about when making the switch or any big drawbacks in doing so?

-- 
Regards Erik


Getting the keys in your system?

2010-02-24 Thread Erik Holstad
If you have a system setup using the RandomPartitioner and have a couple of
indexes
setup for your data but realize that you need to add another index. How do
you get the
keys for your data, so that you can know where to point your indexes?
I guess what I'm really asking is, is there a way to get your keys when
using the RP or
how do people out there deal with something like this?

-- 
Regards Erik


Re: Row with many columns

2010-02-18 Thread Erik Holstad
Hey Rusian!
Maybe you should do what Ted suggested, look at what Cassandra is good at
and then try
to change your data structure from 10 rows with 10 columns to maybe
10 rows
with 10 columns each. I think the best way to solve a problem is to look at
the tools that
you have at hand and try to use then for what they are good at. If it is
really hard to change
your data set and you really need to have the structure that you have, maybe
Cassandra is
not the best option for you.
Good luck and please let the mailing list know if you need any help with
this.


-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Don't be silly, thanks a lot for helping me out!

-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
I don't understand what you mean ;)
Will see what happens when we are done with this first project, will see
if we can get some time to give back.

-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Hey Nate!
What I wanted to do with the get_range_slice was to receive the keys in the
inverted order, so that I could so offset limit queries on key ranges in
reverse
order. Like you said, this can be done for both columns and super columns
with
help of the SliceRange, but not on keys afaik, but maybe there is a way?

Thanks Erik


On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall wrote:

> Erik,
> You can do an inverse with 'reversed=true' in SliceRange as part of
> the SlicePredicate for both get_slice or get_range_slice. I have not
> tried reverse=true on SuperColumn results, but I dont think there is
> any difference there - what can't be changed is how things are ordered
> but direction can go either way (If I am wrong on this, somebody
> please correct me).
>
> http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
> radar as I dont have anything reporting-ish like you describe with
> SuperColumns (yet). I will defer to more experienced folks with this.
>
> Regards,
> -Nate
>
>
> On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad 
> wrote:
> > @Nathan
> > So what I'm planning to do is to store multiple sort orders for the same
> > data, where they all use the
> > same data table just fetches it in different orders, so to say. I want to
> be
> > able to rad the different sort
> > orders from the front and from the back to get both regular and reverse
> sort
> > order.
> >
> > With your approach using super columns you would need to replicate all
> data,
> > right?
> >
> > And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
> > correctly you would need to
> > read the whole thing before you can limit the results handed back to you.
> >
> > In regards to the two calls get_slice and get_range_slice, the way I
> > understand it is that you hand
> > the second one an optional start and stop key plus a limit, to get a
> range
> > of keys/rows. I was planning
> > to use this call together with the OPP, but are thinking about not using
> it
> > since there is no way to do
> > an inverse scan, right?
> >
> > Thanks a lot
> > Erik
> >
> >
> > On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell <
> jesse.mcconn...@gmail.com>
> > wrote:
> >>
> >> infinite is a bit of a bold claim
> >>
> >> by my understanding you are bound by the memory of the jvm as all of
> >> the content of a key/row currently needs to fit in memory for
> >> compaction, which includes columns and supercolumns for given key/row.
> >>
> >> if you are going to run into those scenarios then some sort of
> >> sharding on the keys is required, afaict
> >>
> >> cheers,
> >> jesse
> >>
> >> --
> >> jesse mcconnell
> >> jesse.mcconn...@gmail.com
> >>
> >>
> >>
> >> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
> >> wrote:
> >> > Erik,
> >> > Sure, you could and depending on the workload, that might be quite
> >> > efficient for small pieces of data. However, this also sounds like
> >> > something that might be better addressed with the addition of a
> >> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
> >> >
> >> > Sorts : {
> >> >   sort_row_1 : {
> >> >sortKey1 : { col1:val1, col2:val2 },
> >> >sortKey2 : { col1:val3, col2:val4 }
> >> >   }
> >> > }
> >> >
> >> > You can have an infinite number of SuperColumns for a key, but make
> >> > sure you understand get_slice vs. get_range_slice before you commit to
> >> > a design. Hopefully I understood your example correctly, if not, do
> >> > you have anything more concrete?
> >> >
> >> > Cheers,
> >> > -Nate
> >> >
> >> >
> >> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
> >> > wrote:
> >> >> Thanks Nate for the example.
> >> >>
> >> >> I was thinking more a long the lines of something like:
> >> >>
> >> >> If you have a family
> >> >>
> >> >> Data : {
> >> >>   row1 : {
> >> >> col1:val1,
> >> >>   row2 : {
> >> >> col1:val2,
> >> >> ...
> >> >>   }
> >> >> }
> >> >>
> >> >>
> >> >> Using
> >> >> Sorts : {
> >> >>   sort_row : {
> >> >> sortKey1_datarow1: [],
> >> >> sortKey2_datarow2: []
> >> >>   }
> >> >> }
> >> >>
> >> >> Instead of
> >> >> Sorts : {
> >> >>   sort_row : {
> >> >> sortKey1: datarow1,
> >> >> sortKey2: datarow2
> >> >>   }
> >> >> }
> >> >>
> >> >> If that makes any sense?
> >> >>
> >> >> --
> >> >> Regards Erik
> >> >>
> >> >
> >
> >
> >
> > --
> > Regards Erik
> >
>



-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
@Nathan
So what I'm planning to do is to store multiple sort orders for the same
data, where they all use the
same data table just fetches it in different orders, so to say. I want to be
able to rad the different sort
orders from the front and from the back to get both regular and reverse sort
order.

With your approach using super columns you would need to replicate all data,
right?

And if I understand
http://issues.apache.org/jira/browse/CASSANDRA-598correctly you would
need to
read the whole thing before you can limit the results handed back to you.

In regards to the two calls get_slice and get_range_slice, the way I
understand it is that you hand
the second one an optional start and stop key plus a limit, to get a range
of keys/rows. I was planning
to use this call together with the OPP, but are thinking about not using it
since there is no way to do
an inverse scan, right?

Thanks a lot
Erik


On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
wrote:

> infinite is a bit of a bold claim
>
> by my understanding you are bound by the memory of the jvm as all of
> the content of a key/row currently needs to fit in memory for
> compaction, which includes columns and supercolumns for given key/row.
>
> if you are going to run into those scenarios then some sort of
> sharding on the keys is required, afaict
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconn...@gmail.com
>
>
>
> On Tue, Feb 2, 2010 at 16:30, Nathan McCall 
> wrote:
> > Erik,
> > Sure, you could and depending on the workload, that might be quite
> > efficient for small pieces of data. However, this also sounds like
> > something that might be better addressed with the addition of a
> > SuperColumn on "Sorts" and getting rid of "Data" altogether:
> >
> > Sorts : {
> >   sort_row_1 : {
> >sortKey1 : { col1:val1, col2:val2 },
> >sortKey2 : { col1:val3, col2:val4 }
> >   }
> > }
> >
> > You can have an infinite number of SuperColumns for a key, but make
> > sure you understand get_slice vs. get_range_slice before you commit to
> > a design. Hopefully I understood your example correctly, if not, do
> > you have anything more concrete?
> >
> > Cheers,
> > -Nate
> >
> >
> > On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad 
> wrote:
> >> Thanks Nate for the example.
> >>
> >> I was thinking more a long the lines of something like:
> >>
> >> If you have a family
> >>
> >> Data : {
> >>   row1 : {
> >> col1:val1,
> >>   row2 : {
> >> col1:val2,
> >> ...
> >>   }
> >> }
> >>
> >>
> >> Using
> >> Sorts : {
> >>   sort_row : {
> >> sortKey1_datarow1: [],
> >> sortKey2_datarow2: []
> >>   }
> >> }
> >>
> >> Instead of
> >> Sorts : {
> >>   sort_row : {
> >> sortKey1: datarow1,
> >> sortKey2: datarow2
> >>   }
> >> }
> >>
> >> If that makes any sense?
> >>
> >> --
> >> Regards Erik
> >>
> >
>



-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Thanks Nate for the example.

I was thinking more a long the lines of something like:

If you have a family

Data : {
  row1 : {
col1:val1,
  row2 : {
col1:val2,
...
  }
}


Using
Sorts : {
  sort_row : {
sortKey1_datarow1: [],
sortKey2_datarow2: []
  }
}

Instead of
Sorts : {
  sort_row : {
sortKey1: datarow1,
sortKey2: datarow2
  }
}

If that makes any sense?

-- 
Regards Erik


Re: Key/row names?

2010-02-02 Thread Erik Holstad
Thank you!

On Tue, Feb 2, 2010 at 9:41 AM, Jonathan Ellis  wrote:

> On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad 
> wrote:
> > Is there a way to use a byte[] as the key instead of a string?
>
> no.
>
> > If not what is the main reason for using strings for the key but
> > the columns and the values can be byte[]?
>
> historical baggage.  we might switch to byte[] keys in 0.7.
>
> -Jonathan
>



-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:57 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad wrote:
>
>>
>> Wow that sounds really good. So you are saying if I set it to reverse sort
>> order and count 10 for the first round I get the last 10,
>> for the next call I just set the last column from the first call to start
>> and I will get the columns -10- -20, so to speak?
>
>
> Actually, since they are reversed and you're trying to move backwards,
> you'll need to pass the last column from the first query (since they will be
> sorted in reverse order) as the start to the next one with reverse still set
> to true.
>
> -Brandon
>
>
Thanks a lot Brandon for clearing that out for me, I think that was what I
was trying to say. But that is really good,
cause now I don't have to store the data twice in different sort orders.



-- 
Regards Erik


Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Sorry that there are a lot of questions from me this week,  just trying to
better understand
the best way to use Cassandra :)

Let us say that you know the length of your key, everything is standardized,
are there people
out there that just tag the value onto the key so that you don't have to pay
the extra overhead
of the second byte[]?

-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:35 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad wrote:
>
>> Thanks guys!
>> So I want to use sliceRange but thinking about using the count parameter.
>> For example give me
>> the first x columns, next call I would like to call it with a start value
>> and a count.
>>
>> If I was to use the reverse param in sliceRange I would have to fetch all
>> the columns first, right?
>
>
> If you pass reverse as true, then instead of getting the first x columns,
> you'll get the last x columns.  If you want to head backwards toward the
> beginning, you can pass the first column as the end value.
>
> -Brandon
>
Wow that sounds really good. So you are saying if I set it to reverse sort
order and count 10 for the first round I get the last 10,
for the next call I just set the last column from the first call to start
and I will get the columns -10- -20, so to speak?


-- 
Regards Erik


Key/row names?

2010-02-02 Thread Erik Holstad
Is there a way to use a byte[] as the key instead of a string?
If not what is the main reason for using strings for the key but
the columns and the values can be byte[]? Is it just to be able
to use it as the key in a Map etc or are there other reasons?

-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Thanks guys!
So I want to use sliceRange but thinking about using the count parameter.
For example give me
the first x columns, next call I would like to call it with a start value
and a count.

If I was to use the reverse param in sliceRange I would have to fetch all
the columns first, right?


On Tue, Feb 2, 2010 at 9:23 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote:
>
>> Hey!
>> I'm looking for a comparator that sort columns in reverse order on for
>> example bytes?
>> I saw that you can write your own comparator class, but just thought that
>> someone must have done that already.
>
>
> When you get_slice, just set reverse to true in the SliceRange and it will
> reverse the order.
>
> -Brandon
>



-- 
Regards Erik


Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Hey!
I'm looking for a comparator that sort columns in reverse order on for
example bytes?
I saw that you can write your own comparator class, but just thought that
someone must have done that already.

-- 
Regards Erik


Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 7:45 AM, Brandon Williams  wrote:

> On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad wrote:
>>
>> A supercolumn can still only compare subcolumns in a single way.
>>>
>> Yeah, I know that, but you can have a super column per sort order without
>> having to restart the cluster.
>>
>
> You get a CompareWith for the columns, and a CompareSubcolumnsWith for
> subcolumns.  If you need more column types to get different sort orders, you
> need another ColumnFamily.
>
Not sure what column types mean. What I want to do is to have a few things
sorted by asc and desc order, like {a,b}, {b,a} and {1,2}, {2,1}

>
> -Brandon
>
>


-- 
Regards Erik


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Erik Holstad
Hi Sebastien!
I'm totally new to Cassandra, but as far as I know there is no way of
getting just the keys that are in the
database, they are not stored separately but only with the data itself.

Why do you want a list of keys, what are you going to use them for? Maybe
there is another way of solving
your problem.

What you are describing, getting all the keys/rows for a given column sounds
like you have to fetch all the
data that you have and then filter every key on your column, I don't think
that get_key_range will do that for
you even, says that it takes column_family, but like I said I'm totally new

Erik

2010/2/2 Sébastien Pierre 

> Hi all,
>
> I would like to know how to retrieve the list of available keys available
> for a specific column. There is the get_key_range method, but it is only
> available when using the OrderPreservingPartitioner -- I use a
> RandomPartitioner.
>
> Does this mean that when using a RandomPartitioner, you cannot see which
> keys are available in the database ?
>
>  -- Sébastien
>



-- 
Regards Erik


Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams  wrote:

> On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad wrote:
>
>> Hey!
>> Have a couple of questions about the best way to use Cassandra.
>> Using the random partitioner + the multi_get calls vs order preservation +
>> range_slice calls?
>>
>
> When you use an OPP, the distribution of your keys becomes your problem.
>  If you don't have an even distribution, this will be reflected in the load
> on the nodes, while the RP gives you even distribution.
>

Yeah, that is why it would be nice to hear if anyone has compared the
performance between the two,
to see if it is worth worrying about your own distribution. I also read that
the random partitioner doesn't
give that great distribution.


>
> What is the benefit of using multiple families vs super column?
>
>
> http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I
> prefer simple CFs instead of supercolumns.
>
Yeah, this is nasty.

>
>
>> For example in the case of sorting
>> in different orders. One good thing that I can see here when using super
>> column is that you don't
>> have to restart your cluster every time you want to add something new
>> order.
>>
>
> A supercolumn can still only compare subcolumns in a single way.
>
Yeah, I know that, but you can have a super column per sort order without
having to restart the cluster.

>
> When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you
> will be able to add CFs without restarting.
>
Looks interesting, but targeted at 0.7, so it is probably going to be a
little while, or?

>
> -Brandon
>



-- 
Regards Erik


Re: Sample applications

2010-02-02 Thread Erik Holstad
Hi Carlos!

I'm also really new to Cassandra but here are a couple of links that I found
useful:
http://wiki.apache.org/cassandra/ClientExamples
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

and one of the presentations like:
http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod

Erik


Best design in Cassandra

2010-02-01 Thread Erik Holstad
Hey!
Have a couple of questions about the best way to use Cassandra.
Using the random partitioner + the multi_get calls vs order preservation +
range_slice calls?

What is the benefit of using multiple families vs super column? For example
in the case of sorting
in different orders. One good thing that I can see here when using super
column is that you don't
have to restart your cluster every time you want to add something new order.


-- 
Regards Erik


Re: Internal structure of api calls

2010-02-01 Thread Erik Holstad
Thanks a lot Brandon!


Internal structure of api calls

2010-02-01 Thread Erik Holstad
Hey guys!

I'm totally new to Cassandra and have a couple of question about the
internal structure of some of the calls.

When using the slicerange(count) for the get calls, does the actual result
being truncated on the server
or is it happening on the client ie is it more efficient than the regular
call?

Is there an internal counter for the get_count call that keeps track of the
count or do you only save on
return IO?

-- 
Regards Erik