Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
An additional detail is that the CPU utilization on those nodes is
proportional to the load below, so machines 9.9.9.1 and 9.9.9.3 experience
a fraction of CPU load as compared to the remaining 3 nodes. This might
further point to the possibility that the keys are hashing minimally to the
token ranges on those nodes. I'm no expert at cryptography, but is it
possible that web URLs are not evenly distributed via MD5 hashing due to
the common prefixes they contain? (such as the http://; prefix, or perhaps
a domain name?)? What's also interesting is that the distribution is
more-or-less even across *alternating* nodes... (0, 2, 4 -- vs -- 1, 3).

Thanks,
Safdar


On Sun, Jun 24, 2012 at 6:00 PM, Safdar Kureishy
safdar.kurei...@gmail.comwrote:

 Hi,

 I've searched online but was unable to find any leads for the problem
 below. This mailing list seemed the most appropriate place. Apologies in
 advance if that isn't the case.

 I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
 nodes with tokens *evenly distributed across the token space*, for a
 5-node cluster (as evidenced below under the effective-ownership column
 of the nodetool ring output). My data is a set of a few million crawled
 web pages, crawled using Nutch, and also indexed using the solrindex
 command available through Nutch. AFAIK, the key for each document generated
 from the crawled data is the URL.

 Based on the load values for the nodes below, despite adding about 3
 million web pages to this index via the HTTP Rest API (e.g.:
 http://9.9.9.x:8983/solandra/index/update), some nodes are still
 empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
 (shown in *bold* below) of the index, while the remaining 3 nodes are
 consistently getting hammered by all the data. If the RandomPartioner
 (which is what I'm using for this cluster) is supposed to achieve an even
 distribution of keys across the token space, why is it that the data below
 is skewed in this fashion? Literally, no key was yet been hashed to the
 nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
 this absurdity?.

 [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
 Address DC  RackStatus State   Load
  Effective-Owership  Token

  136112946768375385385349842972707284580
 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
 20.00%  0
 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
  20.00%  34028236692093846346337460743176821145
 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
  20.00%  68056473384187692692674921486353642290
 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
  20.00%  102084710076281539039012382229530463435
 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
  20.00%  136112946768375385385349842972707284580

 Thanks in advance.

 Regards,
 Safdar



Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius
If i read what you are saying, you are _not_ using composite keys? 
That's one thing that could do it, if the first part of the composite 
key had a very very low cardinality.


On 06/24/2012 11:00 AM, Safdar Kureishy wrote:

Hi,

I've searched online but was unable to find any leads for the problem 
below. This mailing list seemed the most appropriate place. Apologies 
in advance if that isn't the case.


I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup 
the nodes with tokens /evenly distributed across the token space/, for 
a 5-node cluster (as evidenced below under the effective-ownership 
column of the nodetool ring output). My data is a set of a few 
million crawled web pages, crawled using Nutch, and also indexed using 
the solrindex command available through Nutch. AFAIK, the key for 
each document generated from the crawled data is the URL.


Based on the load values for the nodes below, despite adding about 3 
million web pages to this index via the HTTP Rest API (e.g.: 
http://9.9.9.x:8983/solandra/index/update), some nodes are still 
empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few 
kilobytes (shown in *bold* below) of the index, while the remaining 3 
nodes are consistently getting hammered by all the data. If the 
RandomPartioner (which is what I'm using for this cluster) is supposed 
to achieve an even distribution of keys across the token space, why is 
it that the data below is skewed in this fashion? Literally, no key 
was yet been hashed to the nodes 9.9.9.1 and 9.9.9.3 below. Could 
someone possibly shed some light on this absurdity?.


[me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
Address DC  RackStatus State   Load   
 Effective-Owership  Token
  
 136112946768375385385349842972707284580
9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB 
20.00%  0
9.9.9.1   datacenter1 rack1   Up Normal *21.44 KB*   
 20.00%  34028236692093846346337460743176821145
9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB   
 20.00%  68056473384187692692674921486353642290
9.9.9.3   datacenter1 rack1   Up Normal *50.79 KB*   
 20.00%  102084710076281539039012382229530463435
9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB   
 20.00%  136112946768375385385349842972707284580


Thanks in advance.

Regards,
Safdar




Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Hi Dave,

Would you mind elaborating a bit more on that, preferably with an example?
AFAIK, Solandra uses the unique id of the Solr document as the input for
calculating the md5 hash for shard/node assignment. In this case the ids
are just millions of varied web URLs that do *not* adhere to any regular
expression. I'm not sure if that answers your question below?

Thanks,
Safdar

On Sun, Jun 24, 2012 at 8:38 PM, Dave Brosius dbros...@mebigfatguy.comwrote:

  If i read what you are saying, you are _not_ using composite keys? That's
 one thing that could do it, if the first part of the composite key had a
 very very low cardinality.


 On 06/24/2012 11:00 AM, Safdar Kureishy wrote:

  Hi,

  I've searched online but was unable to find any leads for the problem
 below. This mailing list seemed the most appropriate place. Apologies in
 advance if that isn't the case.

  I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
 nodes with tokens *evenly distributed across the token space*, for a
 5-node cluster (as evidenced below under the effective-ownership column
 of the nodetool ring output). My data is a set of a few million crawled
 web pages, crawled using Nutch, and also indexed using the solrindex
 command available through Nutch. AFAIK, the key for each document generated
 from the crawled data is the URL.

  Based on the load values for the nodes below, despite adding about 3
 million web pages to this index via the HTTP Rest API (e.g.:
 http://9.9.9.x:8983/solandra/index/update), some nodes are still
 empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
 (shown in *bold* below) of the index, while the remaining 3 nodes are
 consistently getting hammered by all the data. If the RandomPartioner
 (which is what I'm using for this cluster) is supposed to achieve an even
 distribution of keys across the token space, why is it that the data below
 is skewed in this fashion? Literally, no key was yet been hashed to the
 nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
 this absurdity?.

  [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
 Address DC  RackStatus State   Load
  Effective-Owership  Token

  136112946768375385385349842972707284580
 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
 20.00%  0
 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
  20.00%  34028236692093846346337460743176821145
 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
  20.00%  68056473384187692692674921486353642290
 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
  20.00%  102084710076281539039012382229530463435
 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
  20.00%  136112946768375385385349842972707284580

  Thanks in advance.

  Regards,
  Safdar





Re: wildcards as both ends

2012-06-24 Thread aaron morton
 I'm wondering how or if it's possible to implement efficient wildcards at 
 both ends, e.g. *string*
No. 

 - if I can get another equality constraint which narrows down potential 
 result set significantly, I can do a scan. I'm not sure how feasible this is 
 without benchmarks. Does any one know if I can scan couple hundreds / 
 thousands in a 3 node replication factory=2 cluster quickly?

Not efficiently. 

If you need full text capabilities look at Solr, Solandra (the solr to 
cassandra port) or Data Stax Enterprise. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/06/2012, at 2:20 AM, Sam Z J wrote:

 Hi all
 
 I'm wondering how or if it's possible to implement efficient wildcards at 
 both ends, e.g. *string*
 
 I can think of a few options... please comment, thanks =D
 
 - if I can get another equality constraint which narrows down potential 
 result set significantly, I can do a scan. I'm not sure how feasible this is 
 without benchmarks. Does any one know if I can scan couple hundreds / 
 thousands in a 3 node replication factory=2 cluster quickly?
 
 - for each string I have, index all the prefixes in a column family, e.g. for 
 string 'string', I'd have rows string, strin, stri, str, st, s, with column 
 values somehow pointing back as row keys. This almost blows up the storage 
 needed =/ (also, what do I do if I hit the 2billion row width limit? is there 
 a way to say 'insert into another row if the current one is full'?)
 
 thanks
 
 -- 
 Zhongshi (Sam) Jiang
 sammyjiang...@gmail.com



Re: Tiered compation on two disks

2012-06-24 Thread aaron morton
 I have a Cassandra installation where we plan to store 1Tb of data, split 
 between two 1Tb disks.
In general it's a good idea to limit the per node storage to 300GB to 400GB. 
This has more to do with operational issues that any particular issue with 
cassandra. However storing a very large number of keys on a single node can 
result in high memory usage while the server is idling, and reduced read 
performance. 
 
 I know that tiered compaction needs 50% free disk space for worst case 
 situation. 
Not really now days, but it's a good idea to treat 50% as a soft limit. 

 How does this combine with the disk split? 
Whenever a new file is written to disk it will use the data directory with the 
most space. In general we recommend using a single data directory. 

Hope that helps. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/06/2012, at 10:56 PM, Flavio Baronti wrote:

 Hi,
 
 I have a Cassandra installation where we plan to store 1Tb of data, split 
 between two 1Tb disks.
 Tiered compation should be better suited for our workload (append-only, 
 deletion of old data, few reads).
 I know that tiered compaction needs 50% free disk space for worst case 
 situation. How does this combine with the disk split? What happens if I have 
 500Gb of data in one disk and 500Gb in the other? Won't compaction try to 
 build a single 1Tb file, failing since there are only 500Gb free on each disk?
 
 Flavio
 



Re: Weird behavior in Cassandra 1.1.0 - throwing unconfigured CF exceptions when the CF is present

2012-06-24 Thread aaron morton
I would check if the schema's have diverged, run describe cluster in the cli. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/06/2012, at 12:22 AM, Tharindu Mathew wrote:

 Hi,
 
 I'm having issues with Hector 1.1.0 and Cassandra 1.1.0.
 
 I'm adding a column family dynamically, and after sleeping for some time and 
 making sure that the column family is created using 
 keyspacedefinition.getCFs, I still get unconfigured column family exceptions..
 
 Even after some time if I try to insert data I still get unconfigured CF 
 exceptions. Below at [1], I have inserted logs to specifically print all the 
 CFs before inserting data. It is present in the CF list, but still it's 
 failing. Note, that this does not happen for all data. Some data does get 
 inserted.
 
 I'm baffled as to what could be the reason. Any help would be really 
 appreciated.
 
 [1] -
 
 [2012-06-21 17:22:21,680]  INFO 
 {org.wso2.carbon.eventbridge.streamdefn.cassandra.datastore.CassandraConnector}
  -  Keyspace desc. : 
 ThriftKsDef[name=EVENT_KS,strategyClass=org.apache.cassandra.locator.SimpleStrategy,strategyOptions={replication_factor=1},cfDefs=[ThriftCfDef[keyspace=EVENT_KS,name=org_wso2_bam_kp,columnType=STANDARD,comparatorType=me.prettyprint.hector.api.ddl.ComparatorType@c89abe1,subComparatorType=null,comparatorTypeAlias=,subComparatorTypeAlias=,comment=,rowCacheSize=0.0,rowCacheSavePeriodInSeconds=0,keyCacheSize=0.0,readRepairChance=1.0,columnMetadata=[],gcGraceSeconds=864000,keyValidationClass=org.apache.cassandra.db.marshal.BytesType,defaultValidationClass=org.apache.cassandra.db.marshal.BytesType,id=1004,maxCompactionThreshold=32,minCompactionThreshold=4,memtableOperationsInMillions=0.0,memtableThroughputInMb=0,memtableFlushAfterMins=0,keyCacheSavePeriodInSeconds=0,replicateOnWrite=true,compactionStrategy=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={},compressionOptions={sstable_compression=org.apache.cassandra.io.compress.SnappyCompressor},mergeShardsChance=0.0,rowCacheProvider=null,keyAlias=null,rowCacheKeysToSave=0]],durableWrites=true]
 
 [2012-06-21 17:22:21,681]  INFO 
 {org.wso2.carbon.eventbridge.streamdefn.cassandra.datastore.CassandraConnector}
  -  CFs present 
 cf name : org_wso2_bam_kp
 
 [2012-06-21 17:22:21,683] ERROR 
 {org.wso2.carbon.eventbridge.streamdefn.cassandra.subscriber.BAMEventSubscriber}
  -  Error processing event. 
 Event{streamId='org.wso2.bam.kp-1.0.5-6b80ca6c-1ad9-4495-a872-8466c424c5d0', 
 timeStamp=1340279541606, metaData=[external], metaData=null, 
 payloadData=[Orange, 1.0, 520.0, Ivan]}
 me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
 InvalidRequestException(why:unconfigured columnfamily org_wso2_bam_kp)
   at 
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
   at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264)
   at 
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
   at 
 me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
   at 
 org.wso2.carbon.eventbridge.streamdefn.cassandra.datastore.CassandraConnector.insertEvent(CassandraConnector.java:361)
   at 
 org.wso2.carbon.eventbridge.streamdefn.cassandra.subscriber.BAMEventSubscriber.receive(BAMEventSubscriber.java:42)
   at 
 org.wso2.carbon.eventbridge.core.internal.queue.QueueWorker.run(QueueWorker.java:64)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: InvalidRequestException(why:unconfigured columnfamily 
 org_wso2_bam_kp)
   at 
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20169)
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:913)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:899)
   at 
 me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
   at 
 me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
   at 
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
   at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
   ... 11 more
 
 -- 
 Regards,
 
 Tharindu
 
 blog: http://mackiemathew.com/
 



Re: Starting cassandra with -D option

2012-06-24 Thread aaron morton
  Idea is to avoid having the copies of cassandra code in each node,
If you run cassandra from the NAS you are adding a single point of failure into 
the system. 

Better to use some form of deployment automation and install all the 
requirement components onto each node. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/06/2012, at 12:29 AM, Flavio Baronti wrote:

 The option must actually include also the name of the yaml file:
 
 Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf/cassandra.yaml
 
 
 Flavio
 
 
 Il 6/21/2012 13:16 PM, Roshni Rajagopal ha scritto:
 Hi Folks,
 
   We wanted to have a single cassandra installation, and use it to start 
 cassandra in other nodes by passing it the cassandra configuration 
 directories as a parameter. Idea is to avoid having the copies of cassandra 
 code in each node, and starting each node by getting into bin/cassandra of 
 that node.
 
 
 As per http://www.datastax.com/docs/1.0/references/cassandra,
 We have an option –D where we can supply some parameters to cassandra.
 Has anyone tried this?
 Im getting an error as below.
 
 walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$  bin/cassandra  
 -Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf
 walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$  INFO 15:38:01,763 
 Logging initialized
  INFO 15:38:01,766 JVM vendor/version: Java HotSpot(TM) 64-Bit Server 
 VM/1.6.0_31
  INFO 15:38:01,766 Heap size: 1052770304/1052770304
  INFO 15:38:01,766 Classpath: 
 bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools-3.0.1.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar
  INFO 15:38:01,768 JNA not found. Native methods will be disabled.
  INFO 15:38:01,826 Loading settings from 
 file:/Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf
 ERROR 15:38:01,873 Fatal configuration error error
 Can't construct a java object for 
 tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=No single 
 argument constructor found for class org.apache.cassandra.config.Config
  in reader, line 1, column 1:
 cassandra.yaml
 
 The other option would be to modify cassandra.in.sh.
 Has anyone tried this??
 
 Regards,
 Roshni
 
 This email and any files transmitted with it are confidential and intended 
 solely for the individual or entity to whom they are addressed. If you have 
 received this email in error destroy it immediately. *** Walmart 
 Confidential ***
 
 
 
 



Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius

Well it sounds like this doesn't apply to you.

if you had set up your column family in cql as  PRIMARY KEY 
(domain_name, path) or something like that and where looking at lots 
and lots of url pages (domain_name + path), but from a very small number 
domain_names, then the partitioner being just the domain_name could 
account for an uneven distribution.


But it sounds like your key is just a URL so that should (in theory) be 
fine.




On 06/24/2012 01:53 PM, Safdar Kureishy wrote:

Hi Dave,

Would you mind elaborating a bit more on that, preferably with an 
example? AFAIK, Solandra uses the unique id of the Solr document as 
the input for calculating the md5 hash for shard/node assignment. In 
this case the ids are just millions of varied web URLs that do /not/ 
adhere to any regular expression. I'm not sure if that answers your 
question below?


Thanks,
Safdar

On Sun, Jun 24, 2012 at 8:38 PM, Dave Brosius 
dbros...@mebigfatguy.com mailto:dbros...@mebigfatguy.com wrote:


If i read what you are saying, you are _not_ using composite keys?
That's one thing that could do it, if the first part of the
composite key had a very very low cardinality.


On 06/24/2012 11:00 AM, Safdar Kureishy wrote:

Hi,

I've searched online but was unable to find any leads for the
problem below. This mailing list seemed the most appropriate
place. Apologies in advance if that isn't the case.

I'm running a 5-node Solandra cluster (Solr + Cassandra). I've
setup the nodes with tokens /evenly distributed across the token
space/, for a 5-node cluster (as evidenced below under the
effective-ownership column of the nodetool ring output). My
data is a set of a few million crawled web pages, crawled using
Nutch, and also indexed using the solrindex command available
through Nutch. AFAIK, the key for each document generated from
the crawled data is the URL.

Based on the load values for the nodes below, despite adding
about 3 million web pages to this index via the HTTP Rest API
(e.g.: http://9.9.9.x:8983/solandra/index/update), some nodes
are still empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have
just a few kilobytes (shown in *bold* below) of the index, while
the remaining 3 nodes are consistently getting hammered by all
the data. If the RandomPartioner (which is what I'm using for
this cluster) is supposed to achieve an even distribution of keys
across the token space, why is it that the data below is skewed
in this fashion? Literally, no key was yet been hashed to the
nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some
light on this absurdity?.

[me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
Address DC  RackStatus State   Load  
 Effective-Owership  Token
 
 136112946768375385385349842972707284580
9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
20.00%  0
9.9.9.1   datacenter1 rack1   Up Normal *21.44 KB*  
 20.00%  34028236692093846346337460743176821145
9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
   20.00%  68056473384187692692674921486353642290
9.9.9.3   datacenter1 rack1   Up Normal *50.79 KB*  
 20.00%  102084710076281539039012382229530463435
9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
   20.00%  136112946768375385385349842972707284580


Thanks in advance.

Regards,
Safdar







Re: Cassandra 1.0.6 data flush query

2012-06-24 Thread aaron morton
 memtable_total_space_in_mb: 200 
This means cassandra tries to use less than 200MB of real memory to hold 
memtables. The problem is java takes a lot more memory to hold data than it 
takes to store on disk. You can see the ratio of serialized to live bytes 
logged from the Memtable with messages like setting live ratio… It can be 
anywhere from 1 to 64. 

So it the live ratio is 10, your 10MB SSTable is taking 100MB in ram. 

In short, add more ram to the VM. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/06/2012, at 3:58 PM, Roshan wrote:

 Hi
 
 I am using Cassandra 1.0.6 version in our production system and noticed that
 Cassandra flushing the data to SSTable and the file size is  10MB. With
 under moderate write load, the Cassandra flushing lots of memtables with
 small sizes. With this compaction doing lots of compactions.
 
 O/S - Centos 64bit Sun Java 1.6_31
 VM size - 2.4GB
 
 Following parameters change on cassandra.yaml file.
 
 flush_largest_memtables_at: 0.45 (reduce it from .75)
 reduce_cache_sizes_at: 0.55 (reduce it from .85)
 reduce_cache_capacity_to: 0.3 (reduce it from .6)
 concurrent_compactors: 1
 memtable_total_space_in_mb: 200 
 in_memory_compaction_limit_in_mb: 16 (from 64MB)
 Key cache = 1
 Row cache = 0
 Could someone please help me on this.
 
 Thanks
 /Roshan
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-1-0-6-data-flush-query-tp7580733.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Column names overhead

2012-06-24 Thread aaron morton
 What is the penalty for using longer column names?
Each column name is stored in each -Data file were a value is stored for it. So 
if you have muchos overwrites the column name may be stored many places. 

 Should I sacrifice longer self-explanatory names for shorter cryptic ones to 
 save the disk space?
If you have lots of COBOL programmers around it may be OK. If you are at the 
extremes of capacity it may also be ok.  
You may also get some value by storing the schema separately from the data. 

 On one hand, I understand that Cassandra row id a key-value map, but on 
 another hand, it probably uses compression when storing them.
Compression is (currently) off by default, see 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/06/2012, at 5:03 AM, Leonid Ilyevsky wrote:

 What is the penalty for using longer column names?
 Should I sacrifice longer self-explanatory names for shorter cryptic ones to 
 save the disk space?
 On one hand, I understand that Cassandra row id a key-value map, but on 
 another hand, it probably uses compression when storing them.
 
 This email, along with any attachments, is confidential and may be legally 
 privileged or otherwise protected from disclosure. Any unauthorized 
 dissemination, copying or use of the contents of this email is strictly 
 prohibited and may be in violation of law. If you are not the intended 
 recipient, any disclosure, copying, forwarding or distribution of this email 
 is strictly prohibited and this email and any attachments should be deleted 
 immediately. This email and any attachments do not constitute an offer to 
 sell or a solicitation of an offer to purchase any interest in any investment 
 vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon 
 Capital does not provide legal, accounting or tax advice. Any statement 
 regarding legal, accounting or tax matters was not intended or written to be 
 relied upon by any person as advice. Moon Capital does not waive 
 confidentiality or privilege as a result of this email.



Re: Fat Client Commit Log

2012-06-24 Thread aaron morton
The fat client would still have some information in the system CF. 

Are the files big ? Are they continually created ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/06/2012, at 8:07 AM, Frank Ng wrote:

 Hi All,
 
 We are using the Fat Client and notice that there are files written to the 
 commit log directory on the Fat Client.  Does anyone know what these files 
 are storing? Are these hinted handoff data?  The Fat Client has no files in 
 the data directory, as expected.
 
 thanks



Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Thanks.
Oh, I forgot to mention that I'm using cassandra 1.1.0-beta2...in case that
question comes up.
Hoping someone can offer some more feedback on the likelyhood of this
behavior ...
Thanks again,
Safdar
On Jun 24, 2012 9:22 PM, Dave Brosius dbros...@mebigfatguy.com wrote:

  Well it sounds like this doesn't apply to you.

 if you had set up your column family in cql as  PRIMARY KEY
 (domain_name, path) or something like that and where looking at lots
 and lots of url pages (domain_name + path), but from a very small number
 domain_names, then the partitioner being just the domain_name could account
 for an uneven distribution.

 But it sounds like your key is just a URL so that should (in theory) be
 fine.



 On 06/24/2012 01:53 PM, Safdar Kureishy wrote:

 Hi Dave,

  Would you mind elaborating a bit more on that, preferably with an
 example? AFAIK, Solandra uses the unique id of the Solr document as the
 input for calculating the md5 hash for shard/node assignment. In this case
 the ids are just millions of varied web URLs that do *not* adhere to any
 regular expression. I'm not sure if that answers your question below?

  Thanks,
 Safdar

 On Sun, Jun 24, 2012 at 8:38 PM, Dave Brosius dbros...@mebigfatguy.comwrote:

  If i read what you are saying, you are _not_ using composite keys?
 That's one thing that could do it, if the first part of the composite key
 had a very very low cardinality.


 On 06/24/2012 11:00 AM, Safdar Kureishy wrote:

  Hi,

  I've searched online but was unable to find any leads for the problem
 below. This mailing list seemed the most appropriate place. Apologies in
 advance if that isn't the case.

  I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup
 the nodes with tokens *evenly distributed across the token space*, for a
 5-node cluster (as evidenced below under the effective-ownership column
 of the nodetool ring output). My data is a set of a few million crawled
 web pages, crawled using Nutch, and also indexed using the solrindex
 command available through Nutch. AFAIK, the key for each document generated
 from the crawled data is the URL.

  Based on the load values for the nodes below, despite adding about 3
 million web pages to this index via the HTTP Rest API (e.g.:
 http://9.9.9.x:8983/solandra/index/update), some nodes are still
 empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
 (shown in *bold* below) of the index, while the remaining 3 nodes are
 consistently getting hammered by all the data. If the RandomPartioner
 (which is what I'm using for this cluster) is supposed to achieve an even
 distribution of keys across the token space, why is it that the data below
 is skewed in this fashion? Literally, no key was yet been hashed to the
 nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
 this absurdity?.

  [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
 Address DC  RackStatus State   Load
  Effective-Owership  Token

  136112946768375385385349842972707284580
 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
 20.00%  0
 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
  20.00%  34028236692093846346337460743176821145
 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
  20.00%  68056473384187692692674921486353642290
 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
  20.00%  102084710076281539039012382229530463435
 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
  20.00%  136112946768375385385349842972707284580

  Thanks in advance.

  Regards,
  Safdar







Re: how to reduce latency?

2012-06-24 Thread Safdar Kureishy
Hi Yan,
Did you manage to figure out what was causing the increasing latency on
your cluster? Was the resolution just to add more nodes, or something else?
Thanks,
Safdar
On Jun 13, 2012 2:40 PM, Yan Chunlu springri...@gmail.com wrote:

 I have three nodes running cassandra 0.7.4 about two years, as showed
 below:
 10.x.x.x  Up Normal  138.07 GB   33.33%  0

 10.x.x.x Up Normal  143.97 GB   33.33%
  56713727820156410577229101238628035242
 10.x.x.x  Up Normal  137.33 GB   33.33%
  113427455640312821154458202477256070484

 the commitlog and data directory are separated on different disk(Western
 Digital WD RE3 WD1002FBYS 1TB).


 as the data size grow the reads and write time are keep increasing, slow
 down the website frequently.

 Based on the experience that every time using nodetool to maintain the
 nodes, could cost a very long time, consume a lot system resource(always
 ended nowhere),  and cause my web service very unstable, I really have no
 idea what to do, upgrade seems not solving this either, I have a newer
 cluster with have the same configuration but version is 1.0.2, also got
 increasing latency, and the new system also suffering unstable problem...

 just wondering does that means I must add more nodes(which is also a
 painful and slow path)?



 [image: Inline image 1]



Re: Starting cassandra with -D option

2012-06-24 Thread Greg Fausak
I did something similar for my installation, but I used ENV variables:
I created a directory on a machine (call this the master) with directories
for all of the distributions (call them slaves).  So, consider:

/master/slave1
/master/slave2
...
/master/slaven

then i rdist this to all of my slaves.

In the /master directory all of the standard cassandra distribution.

In the /master/slave* directory all of the machine dependent stuff.

Also in /master I have a .profile with:

-bash-4.1$ cat /master/.profile
#
export CASSANDRA_HOME=$HOME/run
SHOST=`hostname | sed s'/\..*//'`
export CASSANDRA_CONF=$CASSANDRA_HOME/conf/$SHOST
export CASSANDRA_INCLUDE=$CASSANDRA_HOME/conf/$SHOST/cassandra.in.sh
. $CASSANDRA_HOME/conf/cassandra-env.sh
PATH=$HOME/run/bin:$PATH
echo 'to start cassandra type cassandra'


this leaves me with this environment on each slave (slave1 example):
-bash-4.1$ env | grep CAS
CASSANDRA_HOME=/usr/share/cassandra/run
CASSANDRA_CONF=/usr/share/cassandra/run/conf/slave1
CASSANDRA_INCLUDE=/usr/share/cassandra/run/conf/slave1/cassandra.in.sh

Using this technique I maintain my Cassandra cluster on 1 machine and
rdist to the
participants.Rdist makes each node independent.

-greg


On Sun, Jun 24, 2012 at 1:11 PM, aaron morton aa...@thelastpickle.com wrote:
  Idea is to avoid having the copies of cassandra code in each node,

 If you run cassandra from the NAS you are adding a single point of failure
 into the system.

 Better to use some form of deployment automation and install all the
 requirement components onto each node.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/06/2012, at 12:29 AM, Flavio Baronti wrote:

 The option must actually include also the name of the yaml file:

 Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf/cassandra.yaml


 Flavio


 Il 6/21/2012 13:16 PM, Roshni Rajagopal ha scritto:

 Hi Folks,


   We wanted to have a single cassandra installation, and use it to start
 cassandra in other nodes by passing it the cassandra configuration
 directories as a parameter. Idea is to avoid having the copies of cassandra
 code in each node, and starting each node by getting into bin/cassandra of
 that node.



 As per http://www.datastax.com/docs/1.0/references/cassandra,

 We have an option –D where we can supply some parameters to cassandra.

 Has anyone tried this?

 Im getting an error as below.


 walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$  bin/cassandra
  -Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf

 walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$  INFO 15:38:01,763
 Logging initialized

  INFO 15:38:01,766 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
 VM/1.6.0_31

  INFO 15:38:01,766 Heap size: 1052770304/1052770304

  INFO 15:38:01,766 Classpath:
 bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools-3.0.1.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar

  INFO 15:38:01,768 JNA not found. Native methods will be disabled.

  INFO 15:38:01,826 Loading settings from
 file:/Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf

 ERROR 15:38:01,873 Fatal configuration error error

 Can't construct a java object for
 tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=No single
 argument constructor found for class org.apache.cassandra.config.Config

  in reader, line 1, column 1:

 cassandra.yaml


 The other option would be to modify cassandra.in.sh.

 Has anyone tried this??


 Regards,

 Roshni


 This email and any files transmitted with it are confidential and intended
 solely for the individual or entity to whom they are addressed. If you have
 received this email in error destroy it immediately. *** Walmart
 Confidential ***








Consistency Problem with Quorum consistencyLevel configuration

2012-06-24 Thread Jason Tang
Hi

I met the consistency problem when we have Quorum for both read and
write.

I use MultigetSubSliceQuery to query rows from super column limit size
100, and then read it, then delete it. And start another around.

But I found, the row which should be delete by last query, it still
shown from next around query.

And also form normal Column Family, I updated the value of one column
from status='FALSE' to status='TURE', and next time I query it, the status
still 'FALSE'.

More detail:

   - It not happened not every time (1/10,000)
   - The time between two round query is around 500 ms (but we found two
   query which 2 seconds happened later then the first one, still have this
   consistency problem)
   - We use ntp as our cluster time synchronization solution.
   - We have 6 nodes, and replication factor is 3

Some body say, Cassandra suppose to have such problem, because read may
not happen before write inside Cassandra. But for two seconds?! And if so,
it meaningless to have Quorum or other consistency level configuration.

   So first of all, is it the correct behavior of Cassandra, and if not,
what data we need to analyze for further investment.

BRs
Ares


Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Jake Luciani
Hi Safdar,

If you want to get better utilization of the cluster raise the
solandra.shards.at.once param in solandra.properties

-Jake



On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy safdar.kurei...@gmail.com
 wrote:

 Hi,

 I've searched online but was unable to find any leads for the problem
 below. This mailing list seemed the most appropriate place. Apologies in
 advance if that isn't the case.

 I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
 nodes with tokens *evenly distributed across the token space*, for a
 5-node cluster (as evidenced below under the effective-ownership column
 of the nodetool ring output). My data is a set of a few million crawled
 web pages, crawled using Nutch, and also indexed using the solrindex
 command available through Nutch. AFAIK, the key for each document generated
 from the crawled data is the URL.

 Based on the load values for the nodes below, despite adding about 3
 million web pages to this index via the HTTP Rest API (e.g.:
 http://9.9.9.x:8983/solandra/index/update), some nodes are still
 empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
 (shown in *bold* below) of the index, while the remaining 3 nodes are
 consistently getting hammered by all the data. If the RandomPartioner
 (which is what I'm using for this cluster) is supposed to achieve an even
 distribution of keys across the token space, why is it that the data below
 is skewed in this fashion? Literally, no key was yet been hashed to the
 nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
 this absurdity?.

 [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
 Address DC  RackStatus State   Load
  Effective-Owership  Token

  136112946768375385385349842972707284580
 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
 20.00%  0
 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
  20.00%  34028236692093846346337460743176821145
 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
  20.00%  68056473384187692692674921486353642290
 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
  20.00%  102084710076281539039012382229530463435
 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
  20.00%  136112946768375385385349842972707284580

 Thanks in advance.

 Regards,
 Safdar




-- 
http://twitter.com/tjake


Re: Limited row cache size

2012-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
I was using the datastax build. Do they also have a 1.1 build?

On Mon, Jun 18, 2012 at 9:05 AM, aaron morton aa...@thelastpickle.com wrote:
 cassandra 1.1.1 ships with concurrentlinkedhashmap-lru-1.3.jar

 row_cache_size_in_mb starts life as an int but the byte size is stored as a
 long
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CacheService.java#L143

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/06/2012, at 7:13 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 hi,
 I configured my server with a row_cache_size_in_mb : 1920

 When  started the server and checked  the JMX it shows the capacity is
 set to 1024MB .

 I investigated further and found that the version of
 concurrentlruhashmap used is 1.2 which sets capacity max value to 1GB.

 So, in cassandra 1.1 the max cache size I can use is 1GB


 Digging deeper , I realized that throughout the API chain the cache
 size is passed around as an int so even if I write my own
 CacheProvider the max size would be Integer.MAX_VALUE = 2GB

 unless cassandra changes the version of concurrentlruhashmap to 1.3
 and change the signature to use a long for size, we can't have a big
 cache. according to me 1 GB is a really small size.

 So , even if I have bigger machines I can't really use them



 --
 -
 Noble Paul





-- 
-
Noble Paul


Re: Limited row cache size

2012-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
sorry I meant 1.1.1 build

On Mon, Jun 25, 2012 at 10:40 AM, Noble Paul നോബിള്‍  नोब्ळ्
noble.p...@gmail.com wrote:
 I was using the datastax build. Do they also have a 1.1 build?

 On Mon, Jun 18, 2012 at 9:05 AM, aaron morton aa...@thelastpickle.com wrote:
 cassandra 1.1.1 ships with concurrentlinkedhashmap-lru-1.3.jar

 row_cache_size_in_mb starts life as an int but the byte size is stored as a
 long
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CacheService.java#L143

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/06/2012, at 7:13 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 hi,
 I configured my server with a row_cache_size_in_mb : 1920

 When  started the server and checked  the JMX it shows the capacity is
 set to 1024MB .

 I investigated further and found that the version of
 concurrentlruhashmap used is 1.2 which sets capacity max value to 1GB.

 So, in cassandra 1.1 the max cache size I can use is 1GB


 Digging deeper , I realized that throughout the API chain the cache
 size is passed around as an int so even if I write my own
 CacheProvider the max size would be Integer.MAX_VALUE = 2GB

 unless cassandra changes the version of concurrentlruhashmap to 1.3
 and change the signature to use a long for size, we can't have a big
 cache. according to me 1 GB is a really small size.

 So , even if I have bigger machines I can't really use them



 --
 -
 Noble Paul





 --
 -
 Noble Paul



-- 
-
Noble Paul


Re: Weird behavior in Cassandra 1.1.0 - throwing unconfigured CF exceptions when the CF is present

2012-06-24 Thread Tharindu Mathew
Yes, it seems an error on our side.

Sorry for the noise.

On Sun, Jun 24, 2012 at 11:38 PM, aaron morton aa...@thelastpickle.comwrote:

 I would check if the schema's have diverged, run describe cluster in the
 cli.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/06/2012, at 12:22 AM, Tharindu Mathew wrote:

 Hi,

 I'm having issues with Hector 1.1.0 and Cassandra 1.1.0.

 I'm adding a column family dynamically, and after sleeping for some time
 and making sure that the column family is created using
 keyspacedefinition.getCFs, I still get unconfigured column family
 exceptions..

 Even after some time if I try to insert data I still get unconfigured CF
 exceptions. Below at [1], I have inserted logs to specifically print all
 the CFs before inserting data. It is present in the CF list, but still it's
 failing. Note, that this does not happen for all data. Some data does get
 inserted.

 I'm baffled as to what could be the reason. Any help would be really
 appreciated.

 [1] -

 [2012-06-21 17:22:21,680]  INFO
 {org.wso2.carbon.eventbridge.streamdefn.cassandra.datastore.CassandraConnector}
 -  Keyspace desc. :
 ThriftKsDef[name=EVENT_KS,strategyClass=org.apache.cassandra.locator.SimpleStrategy,strategyOptions={replication_factor=1},cfDefs=[ThriftCfDef[keyspace=EVENT_KS,name=org_wso2_bam_kp,columnType=STANDARD,comparatorType=me.prettyprint.hector.api.ddl.ComparatorType@c89abe1
 ,subComparatorType=null,comparatorTypeAlias=,subComparatorTypeAlias=,comment=,rowCacheSize=0.0,rowCacheSavePeriodInSeconds=0,keyCacheSize=0.0,readRepairChance=1.0,columnMetadata=[],gcGraceSeconds=864000,keyValidationClass=org.apache.cassandra.db.marshal.BytesType,defaultValidationClass=org.apache.cassandra.db.marshal.BytesType,id=1004,maxCompactionThreshold=32,minCompactionThreshold=4,memtableOperationsInMillions=0.0,memtableThroughputInMb=0,memtableFlushAfterMins=0,keyCacheSavePeriodInSeconds=0,replicateOnWrite=true,compactionStrategy=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={},compressionOptions={sstable_compression=org.apache.cassandra.io.compress.SnappyCompressor},mergeShardsChance=0.0,rowCacheProvider=null,keyAlias=null,rowCacheKeysToSave=0]],durableWrites=true]

 *[2012-06-21 17:22:21,681]  INFO
 {org.wso2.carbon.eventbridge.streamdefn.cassandra.datastore.CassandraConnector}
 -  CFs present *
 *cf name : org_wso2_bam_kp*

 [2012-06-21 17:22:21,683] ERROR
 {org.wso2.carbon.eventbridge.streamdefn.cassandra.subscriber.BAMEventSubscriber}
 -  Error processing event.
 Event{streamId='org.wso2.bam.kp-1.0.5-6b80ca6c-1ad9-4495-a872-8466c424c5d0',
 timeStamp=1340279541606, metaData=[external], metaData=null,
 payloadData=[Orange, 1.0, 520.0, Ivan]}
 me.prettyprint.hector.api.exceptions.HInvalidRequestException:
 InvalidRequestException(why:unconfigured columnfamily *org_wso2_bam_kp*)
 at
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
  at
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264)
 at
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
  at
 me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
 at
 org.wso2.carbon.eventbridge.streamdefn.cassandra.datastore.CassandraConnector.insertEvent(CassandraConnector.java:361)
  at
 org.wso2.carbon.eventbridge.streamdefn.cassandra.subscriber.BAMEventSubscriber.receive(BAMEventSubscriber.java:42)
 at
 org.wso2.carbon.eventbridge.core.internal.queue.QueueWorker.run(QueueWorker.java:64)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:680)
 Caused by: InvalidRequestException(why:unconfigured columnfamily
 org_wso2_bam_kp)
 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20169)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:913)
  at
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:899)
 at
 me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
  at
 me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
 at
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
  at
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
 ... 11 more

 --
 Regards,

 Tharindu

 blog: http://mackiemathew.com/





-- 
Regards,

Tharindu

blog: http://mackiemathew.com/