Re: Cassandra API Library.

2012-09-04 Thread Filipe Gonçalves
@Brian: you can add the Cassandra::Simple Perl client
http://fmgoncalves.github.com/p5-cassandra-simple/

2012/8/27 Paolo Bernardi 

> On 08/23/2012 01:40 PM, Thomas Spengler wrote:
>
>> 4) pelops (Thrift,Java)
>>
>>
>>  I've been using Pelops for quite some time with pretty good results; it
> felt much cleaner than Hector.
>
> Paolo
>
> --
> @bernarpa
> http://paolobernardi.**wordpress.com <http://paolobernardi.wordpress.com>
>
>


-- 
Filipe Gonçalves


Multiget question

2011-11-04 Thread Filipe Gonçalves
Multiget slice queries seem to fetch rows sequentially, at least
fromwhat I understood of the sources. This means the node that
receives amultiget of N keys does N get operations to the other nodes
in thecluster to fetch the remaining keys.
Am I right? Is this the way multiget works internally?
Also, shouldn't this be done in parallel, to avoid contacting
nodesmore than once?
-- 
Filipe Gonçalves


Re: Multiget question

2011-11-04 Thread Filipe Gonçalves
Thanks for the answer.
I hadn't realised requests were made in parallel, I first noticed it
when multiget's took linear time in machines with high loads. Looking
at the code led me to the previous conclusion (N gets for multiget for
N keys). I agree it would take a major overhaul of the code to change
the current behaviour, possiby more than it's worth for the potencial
gains.


2011/11/4 Sylvain Lebresne :
> 2011/11/4 Filipe Gonçalves :
>> Multiget slice queries seem to fetch rows sequentially, at least
>> fromwhat I understood of the sources. This means the node that
>> receives amultiget of N keys does N get operations to the other nodes
>> in thecluster to fetch the remaining keys.
>> Am I right? Is this the way multiget works internally?
>
> The 'sequentially' is probably not right depending on what you meant
> by that (see below) but otherwise yes, a multiget of N keys is internally
> splitted into N gets.
>
>> Also, shouldn't this be done in parallel, to avoid contacting
>> nodesmore than once?
>
> It's done in parallel, in that the coordinating nodes send all the get
> requests in parallel. It doesn't wait for the result to the first get to
> issue the second one. But it does do every get separately, i.e. it may
> contact the same note multiple times.
>
> In theory we could do with at most one message to each node for each
> multiget. We don't do it because it would actually require quite a bit of
> change in the current code and it's unclear it would really buy us much.
> Since we already parallelize requests, we would mostly win a bit on network
> traffic (by merging messages) but there is good chance this is unsignificant
> (of course I could be wrong given we haven't tried).
>
> --
> Sylvain
>
>> --
>> Filipe Gonçalves
>>
>



-- 
Filipe Gonçalves


Re: Is there a way to get only keys with get_indexed_slices?

2011-11-11 Thread Filipe Gonçalves
You can, just set the number of columns returned to zero (count
parameter in the slice range).

The indexed slices thrift call is

get_indexed_slices(ColumnParent column_parent, IndexClause
index_clause, SlicePredicate predicate, ConsistencyLevel
consistency_level)

the count parameter is in the SliceRange within the SlicePredicate (If
you are using the thrift interface directly).

2011/11/11 Maxim Potekhin :
>
> Is there a way to get only keys with get_indexed_slices?
> Looking at the code, it's not possible, but -- is there some way anyhow?
> I don't want to extract any data, just a list of matching keys.
>
> TIA,
>
> Maxim
>
>



-- 
Filipe Gonçalves


Re: is that possible to add more data structure(key-list) in cassandra?

2011-11-11 Thread Filipe Gonçalves
You could use composite columns.For example,

key:
  composite(listname:listindex) : value

A simple get_range would give you access to list as you would normally
have in any programming language, and a "get" could give you direct
access to any index.
Obviously, this would not be a good fit for list which need insertion
at a specific index or reordering...

2011/11/11 Yan Chunlu :
> I thought currently no one is maintaining supercolumns related code, and
> also it not quite efficient.
>
>
> On Fri, Nov 11, 2011 at 2:46 PM, Radim Kolar  wrote:
>>
>> Dne 11.11.2011 5:58, Yan Chunlu napsal(a):
>>>
>>> I think cassandra is doing great job on key-value data store, it saved me
>>> tremendous work on maintain the data consistency and service availability.
>>>  But I think it would be great if it could support more data structures such
>>> as key-list, currently I am using key-value save the list, it seems not very
>>> efficiency. Redis has a good point on this but it is not easy to scale.
>>>
>>> Maybe it is a wrong place and wrong question, only curious if there is
>>> already solution about this, thanks a lot!
>>
>> use supercolumns unless your lists are very large.
>
>



-- 
Filipe Gonçalves


Re: Yanking a dead node

2011-11-24 Thread Filipe Gonçalves
Just remove its token from the ring using

nodetool removetoken 

2011/11/23 Maxim Potekhin :
> This was discussed a long time ago, but I need to know what's the state of
> the art answer to that:
> assume one of my few nodes is very dead. I have no resources or time to fix
> it. Data is replicated
> so the data is still available in the cluster. How do I completely remove
> the dead node without having
> to rebuild it, repair, drain and decommission?
>
> TIA
> Maxim
>
>



-- 
Filipe Gonçalves


Re: Required field 'name' was not present! Struct: Column(name:null)

2011-11-27 Thread Filipe Gonçalves
It's a pretty straightforward error message. Some of your rows have columns
with empty names (e.g. an empty string), and column names can't be empty.

2011/11/27 Masoud Moshref Javadi 

>   I get this error
>
> Required field 'name' was not present! Struct: Column(name:null)
>
> on different column families. My code is going to insert lots of rows in 
> parallel.
>
> I think this debug log from django may help:
>
>
>
>- /root/twiss/lib/python2.7/site-packages/pycassa/pool.py in new_f
> 1.
>
>   if self.max_retries != -1 and self._retry_count > 
> self.max_retries:
>
>2.
>
>   raise MaximumRetryException('Retried %d times. Last 
> failure was %s: %s' %
>
>3.
>
>   (self._retry_count, 
> exc.__class__.__name__, exc))
>
>4.
>
>   # Exponential backoff
>
>5.
>
>   time.sleep(_BASE_BACKOFF * (2 ** self._retry_count))
>
>6.
>7.
>
>   kwargs['reset'] = True
>
> 1.
>
>   return new_f(self, *args, **kwargs)
>
>   ...
> 1.
>2.
>
>   new_f.__name__ = f.__name__
>
>3.
>
>   return new_f
>
>4.
>5.
>
>   def _fail_once(self, *args, **kwargs):
>
>6.
>
>   if self._should_fail:
>
> ▼ Local vars <http://204.57.0.195/LOAD/#>
>  Variable Value   exc
>
>EOFError()
>
>  f
>
>
>
>  self
>
>
>
>  args
>
>({'user50': {'User': 
> [Mutation(column_or_supercolumn=ColumnOrSuperColumn(column=Column(timestamp=1322382778794088,
>  name='password', value='password50', ttl=None), counter_super_column=None, 
> super_column=None, counter_column=None), deletion=None),
>  
> Mutation(column_or_supercolumn=ColumnOrSuperColumn(column=Column(timestamp=1322382778794088,
>  name='name', value='User 50', ttl=None), counter_super_column=None, 
> super_column=None, counter_column=None), deletion=None)]}},
> 1)
>
>  new_f
>
>
>
>  kwargs
>
>{'reset': True}
>
>   - /root/twiss/lib/python2.7/site-packages/pycassa/pool.py in new_f
> 1.
>
>   result = f(self, *args, **kwargs)
>
>2.
>
>   self._retry_count = 0 # reset the count after a success
>
>3.
>
>   return result
>
>4.
>
>   except Thrift.TApplicationException, app_exc:
>
>5.
>
>   self.close()
>
>6.
>
>   self._pool._decrement_overflow()
>
>7.
>
>   self._pool._clear_current()
>
> 1.
>
>   raise app_exc
>
>   ...
> 1.
>
>   except (TimedOutException, UnavailableException, 
> Thrift.TException,
>
>2.
>
>   socket.error, IOError, EOFError), exc:
>
>3.
>
>   self._pool._notify_on_failure(exc, server=self.server, 
> connection=self)
>
>4.
>5.
>
>   self.close()
>
>6.
>
>   self._pool._decrement_overflow()
>
> ▼ Local vars <http://204.57.0.195/LOAD/#>
>  Variable Value   f
>
>
>
>  self
>
>
>
>  args
>
>({'user50': {'User': 
> [Mutation(column_or_supercolumn=ColumnOrSuperColumn(column=Column(timestamp=1322382778794088,
>  name='password', value='password50', ttl=None), counter_super_column=None, 
> super_column=None, counter_column=None), deletion=None),
>  
> Mutation(column_or_supercolumn=ColumnOrSuperColumn(column=Column(timestamp=1322382778794088,
>  name='name', value='User 50', ttl=None), counter_super_column=None, 
> super_column=None, counter_column=None), deletion=None)]}},
> 1)
>
>  app_exc
>
>TApplicationException(None,)
>
>  new_f
>
>
>
>  kwargs
>
>{}
>
>
>
>
>


-- 
Filipe Gonçalves


Re: Setting Key Validation Class

2011-12-05 Thread Filipe Gonçalves
Cassandra's data model is NOT table based. The key is not a column, it
is a separate value.  "index_type: KEYS" means you are creating an
index on that column, and that index can only be acessed in an
equality query ( column = x ).

You should probably read http://www.datastax.com/docs/1.0/ddl/index first.

2011/12/5 Dinusha Dilrukshi :
> Hi,
>
> I am using apache-cassandra-1.0.0 and I tried to insert/retrieve data in a
> column family using cassandra-jdbc program.
> Here is how I created 'USER' column family using cassandra-cli.
>
> create column family USER with comparator=UTF8Type
> and column_metadata=[{column_name: user_id, validation_class: UTF8Type,
> index_type: KEYS},
> {column_name: username, validation_class: UTF8Type, index_type: KEYS},
> {column_name: password, validation_class: UTF8Type}];
>
> But, when i try to insert data to USER column family it gives the error
> "java.sql.SQLException: Mismatched types: java.lang.String cannot be cast to
> java.nio.ByteBuffer".
>
> Since I have set user_id as a KEY and it's validation_class as UTF8Type, I
> was expected Key Validation Class as UTF8Type.
> But when I look at the meta-data of USER column family it shows as "Key
> Validation Class: org.apache.cassandra.db.marshal.BytesType" which has cause
> for the above error.
>
> When I created USER column family as follows, it solves the above issue.
>
> create column family USER with comparator=UTF8Type and
> key_validation_class=UTF8Type
> and column_metadata=[{column_name: user_id, validation_class: UTF8Type,
> index_type: KEYS},
> {column_name: username, validation_class: UTF8Type, index_type: KEYS},
> {column_name: password, validation_class: UTF8Type}];
>
> Do we always need to define key_validation_class as in the above query ?
> Isn't it not enough to add validation classes for each column ?
>
> Regards,
> ~Dinusha~
>



-- 
Filipe Gonçalves


Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-20 Thread Filipe Gonçalves
Generally, RandomPartitioner is the recommended one.
If you already provide randomized keys it doesn't make much of a
difference, the nodes should be balanced with any partitioner.
However, unless you have UUID in all keys of all column families
(highly unlikely) ByteOrderedPartitioner and
OrderPreservingPartitioning will lead to hotspots and unbalanced
rings.

2011/12/20 Drew Kutcharian :
> Hey Guys,
>
> I just came
> across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
> thinking. If the row keys are java.util.UUID which are generated randomly
> (and securely), then what type of partitioner would be the best? Since the
> key values are already random, would it make a difference to use
> RandomPartitioner or one can use ByteOrderedPartitioner or
> OrderPreservingPartitioning as well and get the same result?
>
> -- Drew
>



-- 
Filipe Gonçalves


Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-28 Thread Filipe Gonçalves
There really is no generic way of comparing these systems, NoSQL
databases are highly heterogeneous.
The only credible and accurate way of doing a comparison is for a
specific, well defined, use case. Other than that you are always going
to be comparing apples to oranges thus having an crappy (and in that
one, even inaccurate) comparison to work with.
Some engineers (facebook, twitter and netflix among others if I'm not
mistaken) have done some interesting articles describing where and why
their companies use each database, google those for a minimally
accurate perspective of the NoSQL (and SQL in some cases) database
world.

2011/12/28 CharSyam :
> Don't trust NoSQL Benchmark. It's not a lie. but. NoSQL has different
> performance in many different environment.
>
> Do Benchmark with your real environment. and choose it.
>
> Thank you.
>
>
> 2011/12/28 Igor Lino 
>>
>> You are totally right. I'm far from being an expert on the subject, but
>> the comparison felt inconsistent and incomplete. (I could not express that
>> in my 1st email, not to bias the opinion)
>>
>> Do you know of any similar comparison, which is not biased towards some
>> particular technology or solution?   (so not coming from
>> http://cassandra.apache.org/)
>> I want to understand how superior is Cassandra in its latest release
>> against closer competitors, ideally with the opinion from expert guys.
>>
>>
>> On Wed, Dec 28, 2011 at 12:14 AM, Edward Capriolo 
>> wrote:
>>
>>    This is not really a comparison of anything because each NoSQL has its
>> own bullet points like:
>>    Boats
>>      great for traveling on water
>>    Cars
>>      great for traveling on land
>>    So the conclusion I should gather is?
>>    Also as for the Cassandra bullet points, they are really thin (and
>> wrong). Such as:
>>    Cassandra:
>>    Best used: When you write more than you read (logging). If every
>> component of the system must be in Java. ("No one gets fired for choosing
>> Apache's stuff.")
>>    I view that as:
>>    Nonsensical, inaccurate, and anecdotal.
>>    Also I notice on the other side (and not trying to pick on hbase, but)
>>    hbase:
>>    No single point of failure
>>    Random access performance is like MySQL
>>    Hbase has several SPOF's, its random access performance is definitely
>> NOT 'like mysql',
>>    Cassandra ACTUALLY has no SPOF but as they author mentions, he/she does
>> not like Cassandra so that fact was left out.
>>    From what I can see of the writeup, it is obviously inaccurate in
>> numerous places (without even reading the entire thing).
>>    Also when comparing these technologies very subtle differences in
>> design have profound in effects in operation and performance. Thus someone
>> trying to paper over 6 technologies and compare them with a few bullet
>> points is really doing the world an injustice.
>>    On Tue, Dec 27, 2011 at 5:01 PM, Igor Lino  wrote:
>>
>>        Hi!
>>
>>        I was trying to get an understanding of the real strengths of
>> Cassandra against other competitors. Its actually not that simple and
>> depends a lot on details on the actual requirements.
>>
>>        Reading the following comparison:
>>        http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
>>
>>        It felt like the description of Cassandra painted a limiting
>> picture of its capabilities. Is there any Cassandra expert that could
>> improve that summary? is there any important thing missing? or is there a
>> more fitting common use case for Cassandra than what Mr. Kovacs has given?
>>        (I believe/think that a Cassandra expert can improve that generic
>> description)
>>
>>        Thanks,
>>        Igor
>>
>>
>>
>



-- 
Filipe Gonçalves


Re: Concurrency Control

2012-05-30 Thread Filipe Gonçalves
It's the timestamps provided in the columns that do concurrency
control/conflict resolution. Basically, the newer timestamp wins.
For counters I think there is no such mechanism (i.e. counter updates are
not idempotent).

>From https://wiki.apache.org/cassandra/DataModel :

All values are supplied by the client, including the 'timestamp'. This
> means that clocks on the clients should be synchronized (in the Cassandra
> server environment is useful also), as these timestamps are used for
> conflict resolution. In many cases the 'timestamp' is not used in client
> applications, and it becomes convenient to think of a column as a
> name/value pair. For the remainder of this document, 'timestamps' will be
> elided for readability. It is also worth noting the name and value are
> binary values, although in many applications they are UTF8 serialized
> strings.
> Timestamps can be anything you like, but microseconds since 1970 is a
> convention. Whatever you use, it must be consistent across the application,
> otherwise earlier changes may overwrite newer ones.


2012/5/28 Helen 

> Hi,
> what kind of Concurrency Control Method is used in Cassandra? I found out
> so far
> that it's not done with the MVCC Method and that no vector clocks are
> being used.
> Thanks Helen
>
>


-- 
Filipe Gonçalves