Cassandra DSC installation fail due to some python dependecies. How to rectify ?

2014-02-17 Thread Ertio Lew
I am trying to install cassandra dsc20 but the installation fails due to
some python dependecies. How could I make this work ?


root@server1:~# sudo apt-get install dsc20
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  cassandra libjna-java libopts25 ntp python python-minimal
python-support python2.7
  python2.7-minimal
Suggested packages:
  libjna-java-doc ntp-doc apparmor python-doc python-tk python2.7-doc
binutils binfmt-support
Recommended packages:
  perl
The following NEW packages will be installed:
  cassandra dsc20 libjna-java libopts25 ntp python python-minimal
python-support python2.7
  python2.7-minimal
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Need to get 17.1 MB of archives.
After this operation, 23.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://debian.datastax.com/community/ stable/main cassandra all
2.0.5 [14.3 MB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ raring/main libopts25 amd64
1:5.17.1-1ubuntu2 [62.2 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu/ raring/main ntp amd64
1:4.2.6.p5+dfsg-2ubuntu1 [614 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu/ raring/universe libjna-java
amd64 3.2.7-4 [416 kB]
Get:5 http://us.archive.ubuntu.com/ubuntu/ raring-security/main
python2.7-minimal amd64 2.7.4-2ubuntu3.2 [1223 kB]
Get:6 http://debian.datastax.com/community/ stable/main dsc20 all
2.0.5-1 [1302 B]
Get:7 http://us.archive.ubuntu.com/ubuntu/ raring-security/main
python2.7 amd64 2.7.4-2ubuntu3.2 [263 kB]
Get:8 http://us.archive.ubuntu.com/ubuntu/ raring/main python-minimal
amd64 2.7.4-0ubuntu1 [30.8 kB]
Get:9 http://us.archive.ubuntu.com/ubuntu/ raring/main python amd64
2.7.4-0ubuntu1 [169 kB]
Get:10 http://us.archive.ubuntu.com/ubuntu/ raring/universe
python-support all 1.0.15 [26.7 kB]
Fetched 17.1 MB in 3s (4842 kB/s)
Selecting previously unselected package libopts25.
(Reading database ... 27688 files and directories currently installed.)
Unpacking libopts25 (from .../libopts25_1%3a5.17.1-1ubuntu2_amd64.deb) ...
Selecting previously unselected package ntp.
Unpacking ntp (from .../ntp_1%3a4.2.6.p5+dfsg-2ubuntu1_amd64.deb) ...
Selecting previously unselected package libjna-java.
Unpacking libjna-java (from .../libjna-java_3.2.7-4_amd64.deb) ...
Selecting previously unselected package python2.7-minimal.
Unpacking python2.7-minimal (from
.../python2.7-minimal_2.7.4-2ubuntu3.2_amd64.deb) ...
Selecting previously unselected package python2.7.
Unpacking python2.7 (from .../python2.7_2.7.4-2ubuntu3.2_amd64.deb) ...
Selecting previously unselected package python-minimal.
Unpacking python-minimal (from .../python-minimal_2.7.4-0ubuntu1_amd64.deb) ...
Selecting previously unselected package python.
Unpacking python (from .../python_2.7.4-0ubuntu1_amd64.deb) ...
Selecting previously unselected package python-support.
Unpacking python-support (from .../python-support_1.0.15_all.deb) ...
Selecting previously unselected package cassandra.
Unpacking cassandra (from .../cassandra_2.0.5_all.deb) ...
Selecting previously unselected package dsc20.
Unpacking dsc20 (from .../archives/dsc20_2.0.5-1_all.deb) ...
Processing triggers for man-db ...
Processing triggers for desktop-file-utils ...
Setting up libopts25 (1:5.17.1-1ubuntu2) ...
Setting up ntp (1:4.2.6.p5+dfsg-2ubuntu1) ...
 * Starting NTP server ntpd
 [ OK ]
Setting up libjna-java (3.2.7-4) ...
Setting up python2.7-minimal (2.7.4-2ubuntu3.2) ...
# Empty sitecustomize.py to avoid a dangling symlink
Traceback (most recent call last):
  File "/usr/lib/python2.7/py_compile.py", line 170, in 
sys.exit(main())
  File "/usr/lib/python2.7/py_compile.py", line 162, in main
compile(filename, doraise=True)
  File "/usr/lib/python2.7/py_compile.py", line 106, in compile
with open(file, 'U') as f:
IOError: [Errno 2] No such file or directory:
'/usr/lib/python2.7/sitecustomize.py'
dpkg: error processing python2.7-minimal (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of python2.7:
 python2.7 depends on python2.7-minimal (= 2.7.4-2ubuntu3.2); however:
  Package python2.7-minimal is not configured yet.

dpkg: error processing python2.7 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of python-minimal:
 python-minimal depends on python2.7-minimal (>= 2.7.4-1~); however:
  Package python2.7-minimal is not configured yet.

dpkg: error processing python-minimal (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of python:
 python depends on python2.7 (>= 2.7.4-1~); however:
  Package python2.7 is not configured yet.
 python depends on python-minimal (= 2.7.4-0ubuntu1); however:
  Package python-minimal is not configured yet.

dpkg: error processing python (--configure):
 dependency problem

Re: How do I upgrade a single cassandra node in production to 3 nodes cluster ?

2014-02-16 Thread Ertio Lew
I just mean increasing the cluster size not upgrading the cassandra version


On Mon, Feb 17, 2014 at 2:29 AM,  wrote:

> By upgrade do you mean only adding nodes or also moving up the version of
> C*?
>
>
> On Mon, Feb 17, 2014 at 2:23 AM, Erick Ramirez wrote:
>
>> Ertio,
>>
>> It's not so much upgrading, but simply adding more nodes to your existing
>> setup.
>>
>> Cheers,
>> Erick
>>
>>
>> On Sun, Feb 16, 2014 at 2:13 PM, Ertio Lew  wrote:
>>
>>> I started off with a single cassandra node on my 2GB digital ocean VPS,
>>> but now I'm planning to upgrade it to 3 node cluster. My single node
>>> contain around 10 GB data spread across 10-12 column families.
>>>
>>> What should be the strategy to upgrade that to 3 node cluster, bearing
>>> in mind that my data remains safe on this production server.
>>>
>>>
>>>
>>
>
>
> --
> http://spawgi.wordpress.com
> We can do it and do it better.
>


How do I upgrade a single cassandra node in production to 3 nodes cluster ?

2014-02-15 Thread Ertio Lew
I started off with a single cassandra node on my 2GB digital ocean VPS, but
now I'm planning to upgrade it to 3 node cluster. My single node contain
around 10 GB data spread across 10-12 column families.

What should be the strategy to upgrade that to 3 node cluster, bearing in
mind that my data remains safe on this production server.


Cassandra consuming too much memory in ubuntu as compared to within windows, same machine.

2014-01-04 Thread Ertio Lew
I run a development Cassandra single node server on both ubuntu & windows 8
on my dual boot 4GB(RAM) machine.

I see that cassandra runs fine under windows without any crashes or OOMs
however in ubuntu on same machine, it always gives an OOM message

*$* *sudo service cassandra start*
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4G -Xmx4G -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss256k


Here is the memory usage for empty cassandra server in ubuntu.
*(PID)1169 (USER)cassandr  (PR)20   (NI)0 (VIRT)2639m (RES)1.3g  (SHR)17m S
   (%CPU)1 (%MEMORY)33.9   (TIME)0:53.80(COMMAND)java*

The memory usage however while running under windows is very low relative
to this.

What is the reason behind this ?

Also how can I prevent these OOMs within Ubuntu? I am running Datastax's
DSC version 2.0.3.


Re: Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-10-04 Thread Ertio Lew
Yes, what is Solr Cloud then for, that already provides clustering support,
so what's the need for Cassandra ?


On Tue, Oct 1, 2013 at 2:06 AM, Sávio Teles wrote:

>
> Solr's index sitting on a single machine, even if that single machine can
>> vertically scale, is a single point of failure.
>>
>
> And about Cloud Solr?
>
>
> 2013/9/30 Ken Hancock 
>
>> Yes.
>>
>>
>> On Mon, Sep 30, 2013 at 1:57 PM, Andrey Ilinykh wrote:
>>
>>>
>>> Also, be aware that while Cassandra has knobs to allow you to get
 consistent read results (CL=QUORUM), DSE Search does not. If a node drops
 messages for whatever reason, outtage, mutation, etc. its solr indexes will
 be inconsistent with other nodes in its replication group.

 Will repair fix it?
>>>
>>
>>
>>
>> --
>> *Ken Hancock *| System Architect, Advanced Advertising
>> SeaChange International
>> 50 Nagog Park
>> Acton, Massachusetts 01720
>> ken.hanc...@schange.com | www.schange.com | 
>> NASDAQ:SEAC
>>
>> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
>> LinkedIn] 
>>
>> [image: SeaChange International]
>>  This e-mail and any attachments may contain
>> information which is SeaChange International confidential. The information
>> enclosed is intended only for the addressees herein and may not be copied
>> or forwarded without permission from SeaChange International.
>>
>
>
>
> --
> Atenciosamente,
> Sávio S. Teles de Oliveira
> voice: +55 62 9136 6996
> http://br.linkedin.com/in/savioteles
>  Mestrando em Ciências da Computação - UFG
> Arquiteto de Software
> Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
>


Re: What is the best way to install & upgrade Cassandra on Ubuntu ?

2013-10-03 Thread Ertio Lew
Thanks for clarifications!
Btw DSC installs OpenJDK when java is not present on your system. Don't
know why it doesn't include just the preferred Oracle JRE installation &
take care of later updates to that as well, so that could be a reason to
choose DSC over official apache Debian(as that would be complete package to
run cassandra), otherwise I can't see any strong reasons to prefer it !?


On Fri, Oct 4, 2013 at 4:34 AM, Daniel Chia  wrote:

> Opscenter is a separate package:
> http://www.datastax.com/documentation/opscenter/3.2/webhelp/index.html?pagename=docs&version=opscenter&file=index#opsc/install/opscInstallDeb_t.html
>
> Thanks,
> Daniel
>
>
> On Tue, Oct 1, 2013 at 8:11 PM, Aaron Morton wrote:
>
>> Does DSC include other things like Opscenter by default ?
>>
>> Not sure, I've normally installed it with an existing cluster.
>>
>> Would it be possible to remove any of these installations but keeping the
>> data intact & easily switch to the another, I mean switching from DSC
>> package to apache one or vice versa ?
>>
>> Yes.
>> Same code, same data.
>>
>> A
>>
>>  -
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>>
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On 30/09/2013, at 9:58 PM, Ertio Lew  wrote:
>>
>> Thanks Aaron!
>>
>> Does DSC include other things like Opscenter by default ? I installed DSC
>> on linux, but Opscenter wasn't installed there but when tried on Windows it
>> was installed along with JRE & python, using the windows installer.
>>
>> Would it be possible to remove any of these installations but keeping the
>> data intact & easily switch to the another, I mean switching from DSC
>> package to apache one or vice versa ?
>>
>>
>> On Mon, Sep 30, 2013 at 1:10 PM, Aaron Morton wrote:
>>
>>> I am not sure if I should use datastax's DSC or official Debian packages
>>> from Cassandra. How do I choose between them for a production server ?
>>>
>>> They are technically the same.
>>> The DSC update will come out a little after the Apache release, and I
>>> _think_ they release for every Apache release.
>>>
>>>  1.  when I upgrade to a newer version, would that retain my previous
>>> configurations so that I don't need to configure everything again ?
>>>
>>> Yes if you select that when doing the package install.
>>>
>>> 2.  would that smoothly replace the previous installation by itself ?
>>>>
>>>
>>>> Yes
>>>>
>>>
>>> 3.  what's the way (kindly, if you can tell the command) to upgrade ?
>>>>
>>>
>>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#upgrade/upgradeC_c.html#concept_ds_yqj_5xr_ck
>>>
>>> 4. when should I prefer datastax's dsc to that ? (I need to install for
>>>> production env.)
>>>>
>>> Above
>>>
>>> Hope that helps.
>>>
>>>
>>>  -
>>> Aaron Morton
>>> New Zealand
>>> @aaronmorton
>>>
>>> Co-Founder & Principal Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On 27/09/2013, at 11:01 PM, Ertio Lew  wrote:
>>>
>>> I am not sure if I should use datastax's DSC or official Debian packages
>>> from Cassandra. How do I choose between them for a production server ?
>>>
>>>
>>>
>>> On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew  wrote:
>>>
>>>>
>>>>  Could you please clarify that:
>>>> 1.  when I upgrade to a newer version, would that retain my previous
>>>> configurations so that I don't need to configure everything again ?
>>>> 2.  would that smoothly replace the previous installation by itself ?
>>>> 3.  what's the way (kindly, if you can tell the command) to upgrade ?
>>>> 4. when should I prefer datastax's dsc to that ? (I need to install for
>>>> production env.)
>>>>
>>>>
>>>> On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli wrote:
>>>>
>>>>> On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew wrote:
>>>>>
>>>>>> How do you install Cassandra on Ubuntu & later how do you upgrade the
>>>>>> installation on the node when an update has arrived ? Do you simply
>>>>>> download & replace the latest tar.gz, untar it to replace the older
>>>>>> cassandra files? How do you do it ? How does this upgrade process differ
>>>>>> for a major version upgrade, like say switching from 1.2 series to 2.0
>>>>>> series ?
>>>>>>
>>>>>
>>>>> Use the deb packages. To upgrade, install the new package. Only
>>>>> upgrade a single major version. and be sure to consult NEWS.txt for any
>>>>> upgrade caveats.
>>>>>
>>>>> Also be aware of this sub-optimal behavior of the debian packages :
>>>>>
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2356
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>


Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-09-30 Thread Ertio Lew
Solr's data is stored on the file system as a set of index files[
http://stackoverflow.com/a/7685579/530153]. Then why do we need anything
like Solandra or DataStax Enterprise Search? Isn't Solr complete solution
in itself ?  What do we need to integrate with Cassandra ?


Re: Among Datastax community & Cassandra debian package, which to choose for production install ?

2013-09-30 Thread Ertio Lew
& what about JRE is it provided by DSC so that I don't need to take care of
those* *Oracle JRE* updates myself ? So which one is more preferable or
lets say commonly used for production installs ?

Btw I think I should be able to easily switch between them retaining data ?


On Mon, Sep 30, 2013 at 6:04 PM, Ken Hancock wrote:

> OpsCenter should be a separate package as you would only install it on a
> single node, not necessarily even one that is running Cassandra.
>
>
>
>
> On Sat, Sep 28, 2013 at 2:12 PM, Ertio Lew  wrote:
>
>> I think both provide the same thing except Datastax Community also
>> provides some extras like Opscenter, etc. But I cannot find opscenter
>> installed when I installled DSC on ubuntu. Although on windows
>> installation, I saw opscenter & JRE as well , so I think for DSC, there is
>> no such prerequisite for Oracle JRE as required for Cassandra debain
>> package, is it so ?
>>
>> Btw which is usually preferred for production installs ?
>>
>> I may need to use Opscenter but just *occasionally*.
>>
>
>
>
> --
> *Ken Hancock *| System Architect, Advanced Advertising
> SeaChange International
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hanc...@schange.com | www.schange.com | 
> NASDAQ:SEAC<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
>
> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
> LinkedIn] <http://www.linkedin.com/in/kenhancock>
>
> [image: SeaChange International]
>  <http://www.schange.com/>This e-mail and any attachments may contain
> information which is SeaChange International confidential. The information
> enclosed is intended only for the addressees herein and may not be copied
> or forwarded without permission from SeaChange International.
>


Re: What is the best way to install & upgrade Cassandra on Ubuntu ?

2013-09-30 Thread Ertio Lew
Thanks Aaron!

Does DSC include other things like Opscenter by default ? I installed DSC
on linux, but Opscenter wasn't installed there but when tried on Windows it
was installed along with JRE & python, using the windows installer.

Would it be possible to remove any of these installations but keeping the
data intact & easily switch to the another, I mean switching from DSC
package to apache one or vice versa ?


On Mon, Sep 30, 2013 at 1:10 PM, Aaron Morton wrote:

> I am not sure if I should use datastax's DSC or official Debian packages
> from Cassandra. How do I choose between them for a production server ?
>
> They are technically the same.
> The DSC update will come out a little after the Apache release, and I
> _think_ they release for every Apache release.
>
>  1.  when I upgrade to a newer version, would that retain my previous
> configurations so that I don't need to configure everything again ?
>
> Yes if you select that when doing the package install.
>
> 2.  would that smoothly replace the previous installation by itself ?
>>
>
>> Yes
>>
>
> 3.  what's the way (kindly, if you can tell the command) to upgrade ?
>>
>
>>
> http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#upgrade/upgradeC_c.html#concept_ds_yqj_5xr_ck
>
> 4. when should I prefer datastax's dsc to that ? (I need to install for
>> production env.)
>>
> Above
>
> Hope that helps.
>
>
> -
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 27/09/2013, at 11:01 PM, Ertio Lew  wrote:
>
> I am not sure if I should use datastax's DSC or official Debian packages
> from Cassandra. How do I choose between them for a production server ?
>
>
>
> On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew  wrote:
>
>>
>>  Could you please clarify that:
>> 1.  when I upgrade to a newer version, would that retain my previous
>> configurations so that I don't need to configure everything again ?
>> 2.  would that smoothly replace the previous installation by itself ?
>> 3.  what's the way (kindly, if you can tell the command) to upgrade ?
>> 4. when should I prefer datastax's dsc to that ? (I need to install for
>> production env.)
>>
>>
>> On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli wrote:
>>
>>> On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew  wrote:
>>>
>>>> How do you install Cassandra on Ubuntu & later how do you upgrade the
>>>> installation on the node when an update has arrived ? Do you simply
>>>> download & replace the latest tar.gz, untar it to replace the older
>>>> cassandra files? How do you do it ? How does this upgrade process differ
>>>> for a major version upgrade, like say switching from 1.2 series to 2.0
>>>> series ?
>>>>
>>>
>>> Use the deb packages. To upgrade, install the new package. Only upgrade
>>> a single major version. and be sure to consult NEWS.txt for any upgrade
>>> caveats.
>>>
>>> Also be aware of this sub-optimal behavior of the debian packages :
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-2356
>>>
>>> =Rob
>>>
>>>
>>
>
>


Among Datastax community & Cassandra debian package, which to choose for production install ?

2013-09-28 Thread Ertio Lew
I think both provide the same thing except Datastax Community also provides
some extras like Opscenter, etc. But I cannot find opscenter installed when
I installled DSC on ubuntu. Although on windows installation, I saw
opscenter & JRE as well , so I think for DSC, there is no such prerequisite
for Oracle JRE as required for Cassandra debain package, is it so ?

Btw which is usually preferred for production installs ?

I may need to use Opscenter but just *occasionally*.


Re: What is the best way to install & upgrade Cassandra on Ubuntu ?

2013-09-27 Thread Ertio Lew
I am not sure if I should use datastax's DSC or official Debian packages
from Cassandra. How do I choose between them for a production server ?



On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew  wrote:

>
>  Could you please clarify that:
> 1.  when I upgrade to a newer version, would that retain my previous
> configurations so that I don't need to configure everything again ?
> 2.  would that smoothly replace the previous installation by itself ?
> 3.  what's the way (kindly, if you can tell the command) to upgrade ?
> 4. when should I prefer datastax's dsc to that ? (I need to install for
> production env.)
>
>
> On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli wrote:
>
>> On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew  wrote:
>>
>>> How do you install Cassandra on Ubuntu & later how do you upgrade the
>>> installation on the node when an update has arrived ? Do you simply
>>> download & replace the latest tar.gz, untar it to replace the older
>>> cassandra files? How do you do it ? How does this upgrade process differ
>>> for a major version upgrade, like say switching from 1.2 series to 2.0
>>> series ?
>>>
>>
>> Use the deb packages. To upgrade, install the new package. Only upgrade a
>> single major version. and be sure to consult NEWS.txt for any upgrade
>> caveats.
>>
>> Also be aware of this sub-optimal behavior of the debian packages :
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2356
>>
>> =Rob
>>
>>
>


Re: What is the best way to install & upgrade Cassandra on Ubuntu ?

2013-09-26 Thread Ertio Lew
 Could you please clarify that:
1.  when I upgrade to a newer version, would that retain my previous
configurations so that I don't need to configure everything again ?
2.  would that smoothly replace the previous installation by itself ?
3.  what's the way (kindly, if you can tell the command) to upgrade ?
4. when should I prefer datastax's dsc to that ? (I need to install for
production env.)


On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli  wrote:

> On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew  wrote:
>
>> How do you install Cassandra on Ubuntu & later how do you upgrade the
>> installation on the node when an update has arrived ? Do you simply
>> download & replace the latest tar.gz, untar it to replace the older
>> cassandra files? How do you do it ? How does this upgrade process differ
>> for a major version upgrade, like say switching from 1.2 series to 2.0
>> series ?
>>
>
> Use the deb packages. To upgrade, install the new package. Only upgrade a
> single major version. and be sure to consult NEWS.txt for any upgrade
> caveats.
>
> Also be aware of this sub-optimal behavior of the debian packages :
>
> https://issues.apache.org/jira/browse/CASSANDRA-2356
>
> =Rob
>
>


What is the best way to install & upgrade Cassandra on Ubuntu ?

2013-09-26 Thread Ertio Lew
How do you install Cassandra on Ubuntu & later how do you upgrade the
installation on the node when an update has arrived ? Do you simply
download & replace the latest tar.gz, untar it to replace the older
cassandra files? How do you do it ? How does this upgrade process differ
for a major version upgrade, like say switching from 1.2 series to 2.0
series ?


Why don't you start off with a “single & small” Cassandra server as you usually do it with MySQL?

2013-09-18 Thread Ertio Lew
For any website just starting out, the load initially is minimal & grows
with a slow pace initially. People usually start with their MySQL based
sites with a single server(***that too a VPS not a dedicated server)
running as both app server as well as DB server & usually get too far with
this setup & only as they feel the need they separate the DB from the app
server giving it a separate VPS server. This is what a start up expects the
things to be while planning about resources procurement.

But so far what I have seen, it's something very different with Cassandra.
People usually recommend starting out with atleast a 3 node cluster, (on
dedicated servers) with lots & lots of RAM. 4GB or 8GB RAM is what they
suggest to start with. So is it that Cassandra requires more hardware
resources in comparison to MySQL, for a website to deliver similar
performance, serve similar load/ traffic & same amount of data. I
understand about higher storage requirements of Cassandra due to
replication but what about other hardware resources ?

Can't we start off with Cassandra based apps just like MySQL. Starting with
1 or 2 VPS & adding more whenever there's a need ?

I don't want to compare apples with oranges. I just want to know how much
more dangerous situation I may be in when I start out with a single node
VPS based cassandra installation Vs a single node VPS based MySQL
installation. Difference between these two situations. Are cassandra
servers more prone to be unavailable than MySQL servers ? What is bad if I
put tomcat too along with Cassandra as people use LAMP stack on single
server.

-


*This question is also posted at StackOverflow
here
&
has an open bounty worth +50 rep.*


Maintain backup for single node cluster

2013-09-05 Thread Ertio Lew
I would like to have a single node cassandra cluster initially but to
maintain backups for single node  how about occasionally & temporarily
adding a second node (one that would contain the backup, this could be my
dev machine as well, far far from first node in some remote datacenter) to
cluster as a replica so that data would be synchronized on both as if it
were a replica.

Would it be possible to do this ? May be I could do this backup once 2-3
days.


Re: CustomTThreadPoolServer.java: Error occurred during processing of message.

2013-08-29 Thread Ertio Lew
Running Cassandra (1.0.0 final) single node  with default configurations on
Windows dev machine.  Using Hector.


On Thu, Aug 29, 2013 at 10:50 PM, Ertio Lew  wrote:

> I suddenly started to encounter this weird issue after writing some data
> to Cassandra. Didn't know exactly what was written before this or due to
> which this started happening.
>
>
>
> ERROR [pool-2-thread-30] 2013-08-29 19:55:24,778
> CustomTThreadPoolServer.java (line 205) Error occurred during processing of
> message.
>
> java.lang.StringIndexOutOfBoundsException: String index out of range: -
> 2147418111
>
>  at java.lang.String.checkBounds(String.java:397)
>
> at java.lang.String.(String.java:442)
>
> at
> org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)
>
> at
> org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)
>
> at
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)
>
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
>
>  at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
> at java.lang.Thread.run(Thread.java:662)
>
> ERROR [pool-2-thread-31] 2013-08-29 19:55:24,910
> CustomTThreadPoolServer.java (line 205) Error occurred during processing of
> message.
>
> java.lang.StringIndexOutOfBoundsException: String index out of range: -
> 2147418111
>
>  at java.lang.String.checkBounds(String.java:397)
>
> at java.lang.String.(String.java:442)
>
>  at
> org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)
>
>  at
> org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)
>
>  at
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)
>
>  at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
>
>  at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>  at java.lang.Thread.run(Thread.java:662)
>
>
> Any ideas ??
>
>


CustomTThreadPoolServer.java: Error occurred during processing of message.

2013-08-29 Thread Ertio Lew
I suddenly started to encounter this weird issue after writing some data to
Cassandra. Didn't know exactly what was written before this or due to which
this started happening.



ERROR [pool-2-thread-30] 2013-08-29 19:55:24,778
CustomTThreadPoolServer.java (line 205) Error occurred during processing of
message.

java.lang.StringIndexOutOfBoundsException: String index out of range:
-2147418111

 at java.lang.String.checkBounds(String.java:397)

at java.lang.String.(String.java:442)

at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)

at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)

at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)

at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)

 at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

ERROR [pool-2-thread-31] 2013-08-29 19:55:24,910
CustomTThreadPoolServer.java (line 205) Error occurred during processing of
message.

java.lang.StringIndexOutOfBoundsException: String index out of range:
-2147418111

at java.lang.String.checkBounds(String.java:397)

at java.lang.String.(String.java:442)

 at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)

 at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)

 at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)

 at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)

 at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)

 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)


Any ideas ??


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Ertio Lew
Amazon seems to much overprice its services. If you look out for a similar
size deployment elsewhere like linode or digital ocean(very competitive
pricing), you'll notice huge differences. Ok, some services & features are
extra but may we all don't need them necessarily & when you can host on
non-dedicated virtual servers on Amazon you can also do it with similar
configuration nodes elsewhere too.

IMO these huge costs associated with cassandra deployment are too heavy for
small startups just starting out. I believe, If you consider a deployment
for similar application using MySQL it should be quite cheaper/
affordable(though i'm not exactly sure). Atleast you don't usually create a
cluster from the beginning. Probably we made a wrong decision to choose
cassandra considering only its technological advantages.


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-03 Thread Ertio Lew
@David:
Like all other start-ups, we too cannot start with all dedicated servers
for Cassandra. So right now we have no better choice except for using a VPS
:), but we can definitely choose one from amongst a suitable set of VPS
configurations. As of now since we are starting out, could we initiate our
cluster with 2 nodes(RF=2), (KVM, 2GB ram, 2 cores, 30GB SDD) . Right now
we wont we having a very heavy load on Cassandra until a next few months
till we grow our user base. So, this choice is mainly based on the pricing
vs configuration as well as digital ocean's good reputation in the
community.


On Sun, Aug 4, 2013 at 12:53 AM, David Schairer wrote:

> I've run several lab configurations on linodes; I wouldn't run cassandra
> on any shared virtual platform for large-scale production, just because
> your IO performance is going to be really hard to predict.  Lots of people
> do, though -- depends on your cassandra loads and how consistent you need
> to have performance be, as well as how much of your working set will fit
> into memory.  Remember that linode significantly oversells their CPU as
> well.
>
> The release version of KVM, at least as of a few months ago, still doesn't
> support TRIM on SSD; that, plus the fact that you don't know how others
> will use SSDs or if their file systems will keep the SSDs healthy, means
> that SSD performance on KVM is going to be highly unpredictable.  I have
> not tested digitalocean, but I did test several other KVM+SSD shared-tenant
> hosting providers aggressively for cassandra a couple months ago; they all
> failed badly.
>
> Your mileage will vary considerably based on what you need out of
> cassandra, what your data patterns look like, and how you configure your
> system.  That said, I would use xen before KVM for high-performance IO.
>
> I have not run Cassandra in any volume on Amazon -- lots of folks have,
> and may have recommendations (including SSD) there for where it falls on
> the price/performance curve.
>
> --DRS
>
> On Aug 3, 2013, at 11:33 AM, Ertio Lew  wrote:
>
> > I am building a cluster(initially starting with a 2-3 nodes cluster). I
> have came across two seemingly good options for hosting, Linode & Digital
> Ocean. VPS configuration for both listed below:
> >
> >
> > Linode:-
> > --
> > XEN Virtualization
> > 2 GB RAM
> > 8 cores CPU (2x priority) (8 processor Xen instances)
> > 96 GB Storage
> >
> >
> > Digital Ocean:-
> > -
> > KVM Virtualization
> > 2GB Memory
> > 2 Cores
> > 40GB **SSD Disk***
> > Digitial Ocean's VPS is at half price of above listed Linode VPS,
> >
> >
> > Could you clarify which of these two VPS would be better as Cassandra
> nodes ?
> >
> >
>
>


Which of these VPS configurations would perform better for Cassandra ?

2013-08-03 Thread Ertio Lew
I am building a cluster(initially starting with a 2-3 nodes cluster). I
have came across two seemingly good options for hosting, Linode & Digital
Ocean. VPS configuration for both listed below:


Linode:-
--
XEN Virtualization
2 GB RAM
8 cores CPU (2x priority) (8 processor Xen instances)
96 GB Storage


Digital Ocean:-
-
KVM Virtualization
2GB Memory
2 Cores
40GB ***SSD *Disk***
Digitial Ocean's VPS is at half price of above listed Linode VPS,


Could you clarify which of these two VPS would be better as Cassandra nodes
?


Re:

2013-04-18 Thread Ertio Lew
I use hector


On Thu, Apr 18, 2013 at 1:35 PM, aaron morton wrote:

> > ERROR 08:40:42,684 Error occurred during processing of message.
> > java.lang.StringIndexOutOfBoundsException: String index out of range:
> -214741811
> > 1
> > at java.lang.String.checkBounds(String.java:397)
> > at java.lang.String.(String.java:442)
> > at
> org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
> > .java:339)
> > at
> org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
> This is an error when the server is trying to read what the client has
> sent.
>
> > Is this caused due to my application putting any corrupted data?
> Looks that way. What client are you using ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/04/2013, at 3:21 PM, Ertio Lew  wrote:
>
> > I run cassandra on single win 8 machine for development needs.
> Everything has been working fine for  several months but just today I saw
> this error message in cassandra logs & all host pools were marked down.
> >
> >
> > ERROR 08:40:42,684 Error occurred during processing of message.
> > java.lang.StringIndexOutOfBoundsException: String index out of range:
> -214741811
> > 1
> > at java.lang.String.checkBounds(String.java:397)
> > at java.lang.String.(String.java:442)
> > at
> org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
> > .java:339)
> > at
> org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
> > a.java:18958)
> > at
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(
> > Cassandra.java:3441)
> > at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
> > a:2889)
> > at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
> > (CustomTThreadPoolServer.java:187)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> > utor.java:886)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> > .java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
> >
> > After restarting the server everything again worked fine.
> > I am curious to know what is this related to. Is this caused due to my
> application putting any corrupted data?
> >
> >
>
>


[no subject]

2013-04-17 Thread Ertio Lew
I run cassandra on single win 8 machine for development needs. Everything
has been working fine for  several months but just today I saw this error
message in cassandra logs & all host pools were marked down.



ERROR 08:40:42,684 Error occurred during processing of message.
java.lang.StringIndexOutOfBoundsException: String index out of range:
-214741811
1
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.(String.java:442)
at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
.java:339)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
a.java:18958)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(
Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)


After restarting the server everything again worked fine.
I am curious to know what is this related to. Is this caused due to my
application putting any corrupted data?


Re: Seeking Schema guidance

2012-11-06 Thread Ertio Lew
Thoughts ?


On Tue, Nov 6, 2012 at 3:58 AM, Ertio Lew  wrote:

> I need to store (1)posts written by users, (2)along with activity data by
> other users on these posts & (3) some counters for each post like views
> counts, likes counts, etc. So for each post,  there is 3 category of data
> associated, the original post data which is stored in one CF using single
> row per post, another counters data using 1 row for each post data in
> counters type CF & for activity data, each user stores his own activity
> column for each post he reacted to & also stores activity data of all his
> friends in a dedicated row for every user.
>
>
> So here is my current schema plan :
>
> For Posts:
> -
> 1 CF with single row for each post
>
>
> For Counters:
> --
> 1 CF with single row for each post
>
>
> For Activities Data
> ---
>
> 1 CF with single row for each user
>
>
>
> Now for showing the post at anytime I need to have all the 3 categories of
> data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't
> be trying to merge this data into a single CF as materialized view in
> single row so that read queries could be made more efficiently.
>
> Here is the idea I have got:
>
> For each post I would be storing the post data (written once never updated
> type)+ activities data of all users on that post (written for each user at
> different times & may be edited many times) in a 'single row'. Using
> the activities data of all users I can calculate all the counters data(by
> iterating over activity columns), so I don't need to store that explicitly.
> So now for reading some 10 posts at a time, I just need to read 10 rows.
> Also I set a reasonable limit on no of columns to read so that if the post
> counters are too big I don't have to read all column, then in that (less
> often)cases I perform a second query to read the counters from another CF.
> So for most of the time I would enjoy reading from single CF & single row
> for each post. But another issue is that since that single row will contain
> activity of several users (each column added at different times to row) so
> that row might go in many SSTtables.  So which is a good schema for me 1st
> one or 2nd with respect to performance ?
>
> Thanks.
>
>
>
>
>
>
>
>


Re: Is it bad putting columns with composite or integer name in CF with ByteType comparator & validator ?

2012-11-01 Thread Ertio Lew
Thoughts, please ?


On Thu, Nov 1, 2012 at 7:12 PM, Ertio Lew  wrote:

> Would that do any harm or are there any downsides, if I store columns with
> composite names or Integer type names in a column family with bytesType
> comparator & validator. I have observed that bytesType comparator would
> also sort the integer named columns in similar fashion as done by
> IntegerType comparator, so why should I just lock my CF to just store
> Integer or composite named columns, would be good if I could just mix
> different datatypes in same column family, No !?


Re: Option for ordering columns by timestamp in CF

2012-10-13 Thread Ertio Lew
@B Todd Burruss:
Regarding the use cases, I think they are pretty common. At least I see its
usages very frequently in my project. Lets say when the application needs
to store a timeline of bookmark activity by a user on certain items then if
I could store the activity data containing columns(with concerned item id
as column name) & get it ordered by timestamp then I could also fetch from
that row whether or not a particular item was bookmarked by user.
Ordering columns by time is a very common requirement in any application
therefore if such a mechanism is provided by cassandra, it would be really
useful & convenient to app developers.

On Sat, Oct 13, 2012 at 8:50 PM, Martin Koch  wrote:

> One example could be to identify when a row was last updated. For example,
> if I have a column family for storing users, the row key is a user ID and
> the columns are values for that user, e.g. natural column names would be
> "firstName", "lastName", "address", etc; column names don't naturally
> include a date here.
>
> Sorting the coulmns by timestamp and picking the last would allow me to
> know when the row was last modified. (I could manually maintain a 'last
> modified' column as well, I know, but just coming up with a use case :).
>
> /Martin Koch
>
>
> On Fri, Oct 12, 2012 at 11:39 PM, B. Todd Burruss wrote:
>
>> trying to think of a use case where you would want to order by
>> timestamp, and also have unique column names for direct access.
>>
>> not really trying to challenge the use case, but you can get ordering
>> by timestamp and still maintain a "name" for the column using
>> composites. if the first component of the composite is a timestamp,
>> then you can order on it.  when retrieved you will could have a "name"
>> in the second component .. and have dupes as long as the timestamp is
>> unique (use TimeUUID)
>>
>>
>> On Fri, Oct 12, 2012 at 7:20 AM, Derek Williams  wrote:
>> > You probably already know this but I'm pretty sure it wouldn't be a
>> trivial
>> > change, since to efficiently lookup a column by name requires the
>> columns to
>> > be ordered by name. A separate index would be needed in order to provide
>> > lookup by column name if the row was sorted by timestamp (which is the
>> way
>> > Redis implements it's sorted set).
>> >
>> >
>> > On Fri, Oct 12, 2012 at 12:13 AM, Ertio Lew  wrote:
>> >>
>> >> "Make column timestamps optional"- kidding me, right ?:)  I do
>> understand
>> >> that this wont be possible as then cassandra wont be able to
>> distinguish the
>> >> latest among several copies of same column. I dont mean that. I just
>> want
>> >> the while ordering the columns, Cassandra(in an optional mode per CF)
>> should
>> >> not look at column names(they will exist though but for retrieval
>> purposes
>> >> not for ordering) but instead Cassandra would order the columns by
>> looking
>> >> at the timestamp values(timestamps would exist!). So the change would
>> be
>> >> just to provide a mode in which cassandra, while ordering, uses
>> timestamps
>> >> instead of column names.
>> >>
>> >>
>> >> On Fri, Oct 12, 2012 at 2:26 AM, Tyler Hobbs 
>> wrote:
>> >>>
>> >>> Without thinking too deeply about it, this is basically equivalent to
>> >>> disabling timestamps for a column family and using timestamps for
>> column
>> >>> names, though in a very indirect (and potentially confusing) manner.
>>  So, if
>> >>> you want to open a ticket, I would suggest framing it as "make column
>> >>> timestamps optional".
>> >>>
>> >>>
>> >>> On Wed, Oct 10, 2012 at 4:44 AM, Ertio Lew 
>> wrote:
>> >>>>
>> >>>> I think Cassandra should provide an configurable option on per column
>> >>>> family basis to do columns sorting by time-stamp rather than column
>> names.
>> >>>> This would be really helpful to maintain time-sorted columns without
>> using
>> >>>> up the column name as time-stamps which might otherwise be used to
>> store
>> >>>> most relevant column names useful for retrievals. Very frequently we
>> need to
>> >>>> store data sorted in time order. Therefore I think this may be a very
>> >>>> general requirement & not specific to just my use-case alone.
&

Re: Option for ordering columns by timestamp in CF

2012-10-11 Thread Ertio Lew
"Make column timestamps optional"- kidding me, right ?:)  I do understand
that this wont be possible as then cassandra wont be able to distinguish
the latest among several copies of same column. I dont mean that. I just
want the while ordering the columns, Cassandra(in an optional mode per CF)
should not look at column names(they will exist though but for retrieval
purposes not for ordering) but instead Cassandra would order the columns by
looking at the timestamp values(timestamps would exist!). So the change
would be just to provide a mode in which cassandra, while ordering, uses
timestamps instead of column names.

On Fri, Oct 12, 2012 at 2:26 AM, Tyler Hobbs  wrote:

> Without thinking too deeply about it, this is basically equivalent to
> disabling timestamps for a column family and using timestamps for column
> names, though in a very indirect (and potentially confusing) manner.  So,
> if you want to open a ticket, I would suggest framing it as "make column
> timestamps optional".
>
>
> On Wed, Oct 10, 2012 at 4:44 AM, Ertio Lew  wrote:
>
>> I think Cassandra should provide an configurable option on per column
>> family basis to do columns sorting by time-stamp rather than column names.
>> This would be really helpful to maintain time-sorted columns without using
>> up the column name as time-stamps which might otherwise be used to store
>> most relevant column names useful for retrievals. Very frequently we need
>> to store data sorted in time order. Therefore I think this may be a very
>> general requirement & not specific to just my use-case alone.
>>
>> Does it makes sense to create an issue for this ?
>>
>>
>>
>>
>> On Fri, Mar 25, 2011 at 2:38 AM, aaron morton wrote:
>>
>>> If you mean order by the column timestamp (as passed by the client) that
>>> it not possible.
>>>
>>> Can you use your own timestamps as the column name and store them as
>>> long values ?
>>>
>>> Aaron
>>>
>>> On 25 Mar 2011, at 09:30, Narendra Sharma wrote:
>>>
>>> > Cassandra 0.7.4
>>> > Column names in my CF are of type byte[] but I want to order columns
>>> by timestamp. What is the best way to achieve this? Does it make sense for
>>> Cassandra to support ordering of columns by timestamp as option for a
>>> column family irrespective of the column name type?
>>> >
>>> > Thanks,
>>> > Naren
>>>
>>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>


Re: Option for ordering columns by timestamp in CF

2012-10-10 Thread Ertio Lew
I think Cassandra should provide an configurable option on per column
family basis to do columns sorting by time-stamp rather than column names.
This would be really helpful to maintain time-sorted columns without using
up the column name as time-stamps which might otherwise be used to store
most relevant column names useful for retrievals. Very frequently we need
to store data sorted in time order. Therefore I think this may be a very
general requirement & not specific to just my use-case alone.

Does it makes sense to create an issue for this ?



On Fri, Mar 25, 2011 at 2:38 AM, aaron morton wrote:

> If you mean order by the column timestamp (as passed by the client) that
> it not possible.
>
> Can you use your own timestamps as the column name and store them as long
> values ?
>
> Aaron
>
> On 25 Mar 2011, at 09:30, Narendra Sharma wrote:
>
> > Cassandra 0.7.4
> > Column names in my CF are of type byte[] but I want to order columns by
> timestamp. What is the best way to achieve this? Does it make sense for
> Cassandra to support ordering of columns by timestamp as option for a
> column family irrespective of the column name type?
> >
> > Thanks,
> > Naren
>
>


Re: RF on per column family basis ?

2012-07-28 Thread Ertio Lew
I heard that it is* not highly recommended* to create more than a single
keyspace for an application or on a single cluster !?

Moreover I fail to understand that why Cassandra puts this limitation to
set RF on keyspace when, I guess, it makes more sense to do this on per CF
basis !?


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
I want to read columns for a randomly selected list of userIds(completely
random). I fetch the data using userIds(which would be used as column names
in case of single row or as rowkeys incase of 1 row for each user) for a
selected list of users. Assume that the application knows the list of
userIds  which it has to demand from DB.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
For each user in my application, I want to store a *value* that is queried
by using the userId. So there is going to be one column for each user
(userId as col Name & *value* as col Value). Now I want to store these
columns such that can efficiently read columns for  atleast  300-500 users
in a single read query.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
Actually these columns are 1 for each entity in my application & I need to
query at any time columns for a list of 300-500 entities in one go.


Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-22 Thread Ertio Lew
I want to store hundred of millions of columns(containing id1 to id2
mappings) in the DB & at any single time, retrieve a set of about 200-500
columns based on the column names(id1) if they are in single row or using
rowkeys if each column is stored in a unique row.


If I put them in a single row:-

-> disadvantage is that the no of columns is quite big, that would lead to
uneven load distribution,etc.
-> plus factor is that I can easily read all columns I want to fetch using
col names doing a single row read


But if I store them each in a single row:-

-> I will have to read hundreds of rows(300-500 or in rare cases up
to 1000) at a single time, this may lead to bad read performance(!?).
-> A bit less space efficient


What schema should I go with ?


Re: How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Ertio Lew
@Brandon : I just created a jira issue to request this type of comparator
along with Cassandra.

It is about a UTF8 comparator that provides case insensitive ordering of
columns.
See issue here : https://issues.apache.org/jira/browse/CASSANDRA-4245

On Tue, May 15, 2012 at 11:14 AM, Brandon Williams  wrote:

> On Mon, May 14, 2012 at 1:11 PM, Ertio Lew  wrote:
> > I need to add a custom comparator to a cluster, to sort columns in a
> certain
> > customized fashion. How do I add the class to the cluster  ?
>
> I highly recommend against doing this, because you'll be locked in to
> your comparator and not have an easy way out.  I dare say if none of
> the currently available comparators meet your needs, you're doing
> something wrong.
>
> -Brandon
>


Re: How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Ertio Lew
Can I put this comparator class in a separate new jar(with just this single
file) or is it to be appended to the original jar along with the other
comparator classes?

On Tue, May 15, 2012 at 12:22 AM, Tom Duffield (Mailing Lists) <
tom.duffield.li...@gmail.com> wrote:

> Kirk is correct.
>
> --
> Tom Duffield (Mailing Lists)
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
> On Monday, May 14, 2012 at 1:41 PM, Kirk True wrote:
>
> Disclaimer: I've never tried, but I'd imagine you can drop a JAR
> containing the class(es) into the lib directory and perform a rolling
> restart of the nodes.
>
> On 5/14/12 11:11 AM, Ertio Lew wrote:
>
> I need to add a custom comparator to a cluster, to sort columns in a
> certain customized fashion. How do I add the class to the cluster ?
>
>
>


How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Ertio Lew
I need to add a custom comparator to a cluster, to sort columns in a
certain customized fashion. How do I add the class to the cluster  ?


How to make the search by columns in range case insensitive ?

2012-05-14 Thread Ertio Lew
I need to make a search by names index using entity names as column names
in a row. This data is split in several rows using the first 3 character of
entity name as row key & the remaining part as column name & col value
contains entity id.

But there is a problem, I m storing this data in a CF using byte type
comparator. I need to make case insensitive queries to retrieve 'n' no of
cols column names starting from a point.
Any ideas about how should I do that ?


Re: Schema advice/help

2012-03-27 Thread Ertio Lew
@R. Verlangen:
You are suggesting to keep a single row for all activities & read all the
columns from the row & then filter, right!?

If done that way (instead of keeping it in 5 rows) then I would need to
retrieve 100s-200s of columns from single row rather than just 50 columns
if I keep in 5 rows.. Which of these two would be better ? More columns
from single row OR less columns from multiple rows ?

On Tue, Mar 27, 2012 at 2:27 PM, R. Verlangen  wrote:

> You can just get a slice range with as start "userId:" and no end.
>
>
> 2012/3/27 Maciej Miklas 
>
>> multiget would require Order Preserving Partitioner, and this can lead to
>> unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have small
>> cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito wrote:
>>
>>> without the ability to do disjoint column slices, i would probably use 5
>>> different rows.
>>>
>>> userId:itemType -> activityId
>>>
>>> then it's a multiget slice of 10 items from each of your 5 rows.
>>>
>>>
>>> On 26/03/2012 22:16, Ertio Lew wrote:
>>>
>>>> I need to store activities by each user, on 5 items types. I always
>>>> want to read last 10 activities on each item type, by a user (ie, total
>>>> activities to read at a time =50).
>>>>
>>>> I am wanting to store these activities in a single row for each user so
>>>> that they can be retrieved in single row query, since I want to read all
>>>> the last 10 activities on each item.. I am thinking of creating composite
>>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>>> value) but then, I don't see about how to read the last 10 activities from
>>>> all itemtypes.
>>>>
>>>> Any ideas about schema to do this better way ?
>>>>
>>>
>>>
>>
>
>
> --
> With kind regards,
>
> Robin Verlangen
> www.robinverlangen.nl
>
>


Schema advice/help

2012-03-26 Thread Ertio Lew
I need to store activities by each user, on 5 items types. I always want to
read last 10 activities on each item type, by a user (ie, total activities
to read at a time =50).

I am wanting to store these activities in a single row for each user so
that they can be retrieved in single row query, since I want to read all
the last 10 activities on each item.. I am thinking of creating composite
names appending "itemtype" : "activityId"(activityId is just timestamp
value) but then, I don't see about how to read the last 10 activities from
all itemtypes.

Any ideas about schema to do this better way ?


Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-26 Thread Ertio Lew
I need to use the range beyond the integer32 type range,  so I am using
Long to write those keys. I am afraid if this might lead to collisions with
the previously  stored integer keys in the same CF even if I leave out the
int32 type range.

On Mon, Mar 26, 2012 at 10:51 PM, aaron morton wrote:

> without them overlapping/disturbing each other (assuming that keys lie in
> above domains) ?
>
> Not sure what you mean by overlapping.
>
> 42 as a int and 42 as a long are the same key.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/03/2012, at 9:47 PM, Ertio Lew wrote:
>
> I have been writing rows to a CF all with integer(4 byte) keys. So my CF
> contains rows with keys in the entire range from Integer.MIN_VALUE to
> Integer.MAX_VALUE.
>
> Now I want to store Long type keys as well in this CF **without disturbing
> the integer keys. The range of Long type keys would be excluding the
> integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63).
>
> Would it be safe to mix the integer & long keys in single CF without them
> overlapping/disturbing each other (assuming that keys lie in above domains)
> ?
>
>
>


Re: Fwd: information on cassandra

2012-03-25 Thread Ertio Lew
I guess 2 node cluster with RF=2 might also be a starting point. Isn't it ?
Are there any issues with this ?

On Sun, Mar 25, 2012 at 12:20 AM, samal  wrote:

> Cassandra has distributed architecture. So 1 node does not fit into it.
> although it can used but you loose its benefits , ok if you are just
> playing around, use vm  to learn how cluster communicate, handle request.
>
> To get full tolerance, redundancy and consistency minimum 3 node is
> required.
>
> Imp read here:
> http://wiki.apache.org/cassandra/
> http://www.datastax.com/docs/1.0/index
> http://thelastpickle.com/
> http://www.acunu.com/blogs/all/
>
>
>
> On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta wrote:
>
>> its not advisable to use cassandra on single node, as its basic
>> definition says if a node fails, data still remains in the system, atleast
>> 3 nodes must be there while setting up a cassandra cluster.
>>
>>
>> Garvita Mehta
>> CEG - Open Source Technology Group
>> Tata Consultancy Services
>> Ph:- +91 22 67324756
>> Mailto: garvita.me...@tcs.com
>> Website: http://www.tcs.com
>> 
>> Experience certainty. IT Services
>> Business Solutions
>> Outsourcing
>> 
>>
>> -puneet loya **wrote: -
>>
>> To: user@cassandra.apache.org
>> From: puneet loya 
>> Date: 03/24/2012 06:36PM
>> Subject: Fwd: information on cassandra
>>
>>
>>
>>
>> hi,
>>
>> I m puneet, an engineering student. I would like to know that, is
>> cassandra useful considering we just have a single node(rather a single
>> system) having all the information.
>> I m looking for decent response time for the database. can you please
>> respond?
>>
>> Thank you ,
>>
>> Regards,
>>
>> Puneet Loya
>>
>> =-=-=
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>> and any attachments. Thank you
>>
>>
>


Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-25 Thread Ertio Lew
I have been writing rows to a CF all with integer(4 byte) keys. So my CF
contains rows with keys in the entire range from Integer.MIN_VALUE to
Integer.MAX_VALUE.

Now I want to store Long type keys as well in this CF **without disturbing
the integer keys. The range of Long type keys would be excluding the
integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63).

Would it be safe to mix the integer & long keys in single CF without them
overlapping/disturbing each other (assuming that keys lie in above domains)
?


Re: Using cassandra at minimal expenditures

2012-03-01 Thread Ertio Lew
expensive :-) I was expecting to start with 2GB nodes, if not 1GB for
intial.

On Thu, Mar 1, 2012 at 3:43 PM, aaron morton wrote:

> As others said, depends on load and traffic and all sorts of thins.
>
> if you want a number, 4Gb would me a reasonable minimum IMHO. (You may get
> by with less).  8Gb is about the tops.
> Any memory not allocated to Cassandra  will be used to map files into
> memory.
>
> If you can get machines with 8GB ram thats a reasonable start.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/03/2012, at 1:16 AM, Maki Watanabe wrote:
>
> Depends on your traffic :-)
>
> cassandra-env.sh will try to allocate heap with following formula if
> you don't specify MAX_HEAP_SIZE.
> 1. calculate 1/2 of RAM on your system and cap to 1024MB
> 2. calculate 1/4 of RAM on your system and cap to 8192MB
> 3. pick the larger value
>
> So how about to start with the default? You will need to monitor the
> heap usage at first.
>
> 2012/2/29 Ertio Lew :
>
> Thanks, I think I don't need high consistency(as per my app requirements)
> so
>
> I might be fine with CL.ONE instead of quorum, so I think  I'm probably
>
> going to be ok with a 2 node cluster initially..
>
>
> Could you guys also recommend some minimum memory to start with ? Of course
>
> that would depend on my workload as well, but that's why I am asking for
> the
>
> min
>
>
>
> On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe 
>
> wrote:
>
>
> If you run your service with 2 node and RF=2, your data will be
>
> replicated but
>
> your service will not be redundant. ( You can't stop both of nodes )
>
>
> If your service doesn't need strong consistency ( allow cassandra returns
>
> "old" data after write, and possible write lost ), you can use CL=ONE
>
> for read and write
>
> to keep availability.
>
>
> maki
>
>
>
>
>
>
> --
> w3m
>
>
>


Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
Thanks, I think I don't need high consistency(as per my app requirements)
so I might be fine with CL.ONE instead of quorum, so I think  I'm probably
going to be ok with a 2 node cluster initially..

Could you guys also recommend some minimum memory to start with ? Of course
that would depend on my workload as well, but that's why I am asking for
the min

On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe wrote:

> > If you run your service with 2 node and RF=2, your data will be
> replicated but
> > your service will not be redundant. ( You can't stop both of nodes )
>
> If your service doesn't need strong consistency ( allow cassandra returns
> "old" data after write, and possible write lost ), you can use CL=ONE
> for read and write
> to keep availability.
>
> maki
>


Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
@Aaron: Are you suggesting 3 nodes (rather than 2) to allow quorum
operations even at the temporary loss of 1 node from cluster's reach ? I
understand this but I just another question popped up in my mind, probably
since I'm not much experienced managing cassandra, so I'm unaware whether
it may be a usual case that some out of n nodes of my cluster may be
down/unresponsive or out of cluster reach? (I, actually, considered this
situation like exceptional circumstance not normal one !?)


On Tue, Feb 28, 2012 at 2:34 AM, aaron morton wrote:

> *1. *I am wandering *what is the minimum recommended cluster size to
> start with*?
>
> IMHO 3
> http://thelastpickle.com/2011/06/13/Down-For-Me/
>
> A
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/02/2012, at 8:17 AM, Ertio Lew wrote:
>
> Hi
>
> I'm creating an networking site using cassandra. I am wanting to host this
> application but initially with the lowest possible resources & then slowly
> increasing the resources as per the service's demand & need.
>
> *1. *I am wandering *what is the minimum recommended cluster size to
> start with*?
> Are there any issues if I start with as little as 2 nodes in the cluster?
> In that case I guess would have replication factor of 2.
> (this way I would require at min. 3 vps, 1 as web server & the 2 for
> cassandra cluster, right?)
>
> *2.* Anyone using cassandra with such minimal resources in
> production environments ? Any experiences or difficulties encountered ?
>
> *3.* In case, you would like to recommend some hosting service suitable
> for me ? or if you would like to suggest some other ways to minimize the
> resources (actually the hosting expenses).
>
>
>


Using cassandra at minimal expenditures

2012-02-27 Thread Ertio Lew
Hi

I'm creating an networking site using cassandra. I am wanting to host this
application but initially with the lowest possible resources & then slowly
increasing the resources as per the service's demand & need.

*1. *I am wandering *what is the minimum recommended cluster size to start
with*?
Are there any issues if I start with as little as 2 nodes in the cluster?
In that case I guess would have replication factor of 2.
(this way I would require at min. 3 vps, 1 as web server & the 2 for
cassandra cluster, right?)

*2.* Anyone using cassandra with such minimal resources in
production environments ? Any experiences or difficulties encountered ?

*3.* In case, you would like to recommend some hosting service suitable for
me ? or if you would like to suggest some other ways to minimize the
resources (actually the hosting expenses).


Re: Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-29 Thread Ertio Lew
On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael
wrote:

>  OpsCenter?
>
>  http://www.datastax.com/products/opscenter
>
>  - Mike
>
>
>  I have tried Sebastien's phpmyAdmin For 
> Cassandra to
> see the data stored in Cassandra in the same manner as phpMyAdmin allows.
> But since it makes assumptions about the datatypes of the column
> name/column value & doesn't allow to configure the datatype data should be
> read as on per cf basis, I couldn't make the best use of it.
>
>  Are there any similar other tools out there that can do the job better ?
>

Thanks, that's a great product but unfortunately doesn't work with windows.
Any tools for windows ?


Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-29 Thread Ertio Lew
I have tried Sebastien's phpmyAdmin For
Cassandra to
see the data stored in Cassandra in the same manner as phpMyAdmin allows.
But since it makes assumptions about the datatypes of the column
name/column value & doesn't allow to configure the datatype data should be
read as on per cf basis, I couldn't make the best use of it.

Are there any similar other tools out there that can do the job better ?


Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-19 Thread Ertio Lew
It wont obviously matter in case your columns are fat but in several cases,
(at least I could think of several cases) where you need to, for example,
just store an integer column name & empty column value. Thus 12 bytes for
the column where 8 bytes is just the overhead to store timestamps doesn't
look very nice. And skinny columns is a very common use-case, I believe.

On Thu, Jan 19, 2012 at 1:26 PM, Maxim Potekhin  wrote:

> I must have accidentally deleted all messages in this thread save this one.
>
> On the face value, we are talking about saving 2 bytes per column. I know
> it can add up with many columns, but relative to the size of the column --
> is it THAT significant?
>
> I made an effort to minimize my CF footprint by replacing the "natural"
> column keys with integers (and translating back and forth when writing and
> reading). It's easy to see that in my case I achieve almost 50% storage
> savings and at least 30%. But if the column in question contains more than
> 20 bytes -- what's up with trying to save 2?
>
> Cheers
>
> Maxim
>
>
>
> On 1/18/2012 11:49 PM, Ertio Lew wrote:
>
>> I believe the timestamps *on per column basis* are only required until
>> the compaction time after that it may also work if the timestamp range
>> could be specified globally on per SST table basis. and thus the
>> timestamps until compaction are only required to be measure the time
>> from the initialization of the new memtable to the point the column is
>> written to that memtable. Thus you can easily fit that time in 4
>> bytes. This I believe would save atleast  4 bytes overhead for each
>> column.
>>
>> Is anything related to these overheads under consideration/ or planned
>> in the roadmap ?
>>
>>
>>
>> On Tue, Sep 6, 2011 at 11:44 AM, Oleg 
>> Anastastasyev>
>>  wrote:
>>
>>> I have a patch for trunk which I just have to get time to test a bit
>>>> before I
>>>>
>>> submit.
>>>
>>>> It is for super columns and will use the super columns timestamp as the
>>>> base
>>>>
>>> and only store variant encoded offsets in the underlying columns.
>>> Could you please measure how much real benefit it brings (in real RAM
>>> consumption by JVM). It is hard to tell will it give noticeable results
>>> or not.
>>> AFAIK memory structures used for memtable consume much more memory. And
>>> 64-bit
>>> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
>>> consumption reduction looks doubtful.
>>>
>>>
>>>
>


Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Ertio Lew
I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?



On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev  wrote:
>
>>
>> I have a patch for trunk which I just have to get time to test a bit before I
> submit.
>> It is for super columns and will use the super columns timestamp as the base
> and only store variant encoded offsets in the underlying columns.
>>
>
> Could you please measure how much real benefit it brings (in real RAM
> consumption by JVM). It is hard to tell will it give noticeable results or 
> not.
> AFAIK memory structures used for memtable consume much more memory. And 64-bit
> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
> consumption reduction looks doubtful.
>
>


Re: Composite column names: How much space do they occupy ?

2012-01-02 Thread Ertio Lew
Yes that makes a lot of sense!  on using remaining() method I see the
proper expected sizes.


On Mon, Jan 2, 2012 at 5:26 PM, Sylvain Lebresne wrote:

> I am not familiar enough with Hector to tell you if it is doing something
> special here, but note that:
>
> 1) you may have better luck getting that kind of question answered
> quickly by using the Hector mailing list.
>
> 2) that may or may not change what you're seeing (since again I don't
> know what Hector is actually doing), but "bb.array().length" is not a
> reliable way to get the effective length of a ByteBuffer, as it is
> perfectly
> legit to have a byte buffer only use parts of it's underlying array. You
> should use the remaining() method instead.
>
> --
> Sylvain
>
> On Mon, Jan 2, 2012 at 12:29 PM, Ertio Lew  wrote:
> > Sorry I forgot to tell that I'm using Hector to communicate with
> Cassandra.
> > CS.toByteBuffer  is to convert the composite type name to ByteBuffer.
> >
> > Can anyone aware of Hector API enlighten me why am I seeing this size for
> > the composite type names.
> >
> >
> > On Mon, Jan 2, 2012 at 2:52 PM, aaron morton 
> > wrote:
> >>
> >> What is the definition of the composite type and what is CS.toByteBuffer
> >> ?
> >>
> >> CompositeTypes have a small overhead
> >> see
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
> >>
> >> Hope that helps.
> >> Aaron
> >>
> >> -
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 2/01/2012, at 6:25 PM, Ertio Lew wrote:
> >>
> >> I am storing composite column names which are made up of two integer
> >> components. However I am shocked after seeing the storage overhead of
> these.
> >>
> >> I just tried out a composite name (with single integer component):
> >>
> >>   Composite composite = new Composite();
> >>   composite.addComponent(-165376575,is);
> >>
> >> System.out.println(CS.toByteBuffer( composite ).array().length); // the
> >> result is 256
> >>
> >>
> >> After writing & then reading back this composite column from cassandra:
> >>
> >>
> >>
> System.out.println(CS.toByteBuffer( readColumn.getName() ).array().length);
> >> // the result is 91
> >>
> >>
> >> How much is the storage overhead, as I am quite sure that  I'm making a
> >> mistake in realizing the actual values ?
> >>
> >>
> >
>


Re: Composite column names: How much space do they occupy ?

2012-01-02 Thread Ertio Lew
Sorry I forgot to tell that I'm using Hector to communicate with
Cassandra.  CS.toByteBuffer  is to convert the composite type name to
ByteBuffer.

Can anyone aware of Hector API enlighten me why am I seeing this size for
the composite type names.

On Mon, Jan 2, 2012 at 2:52 PM, aaron morton wrote:

> What is the definition of the composite type and what is CS.toByteBuffer ?
>
> CompositeTypes have a small overhead see
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
>
> Hope that helps.
> Aaron
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/01/2012, at 6:25 PM, Ertio Lew wrote:
>
> I am storing composite column names which are made up of two integer
> components. However I am shocked after seeing the storage overhead of these.
>
> I just tried out a composite name (with single integer component):
>
>   Composite composite = new Composite();
>   composite.addComponent(-165376575,is);
>
> System.out.println(CS.toByteBuffer( composite ).array().length); // the
> result is 256
>
>
> After writing & then reading back this composite column from cassandra:
>
> System.out.println(CS.toByteBuffer( readColumn.getName() ).array().length);
> // the result is 91
>
>
> How much is the storage overhead, as I am quite sure that  I'm making a
> mistake in realizing the actual values ?
>
>
>


Composite column names: How much space do they occupy ?

2012-01-01 Thread Ertio Lew
I am storing composite column names which are made up of two integer
components. However I am shocked after seeing the storage overhead of these.

I just tried out a composite name (with single integer component):

  Composite composite = new Composite();
  composite.addComponent(-165376575,is);

System.out.println(CS.toByteBuffer( composite ).array().length); // the
result is 256


After writing & then reading back this composite column from cassandra:

System.out.println(CS.toByteBuffer( readColumn.getName() ).array().length);
// the result is 91


How much is the storage overhead, as I am quite sure that  I'm making a
mistake in realizing the actual values ?


Doubts related to composite type column names/values

2011-12-20 Thread Ertio Lew
With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column names/values,
and

2. what exactly is the difference between the DynamicComposite and Static
Composite ?


Retreiving column by names Vs by range, which is more performant ?

2011-11-03 Thread Ertio Lew
Retrieving columns by names vs by range which is more performant , when you
have the options to do both ?


Re: Second Cassandra users survey

2011-11-03 Thread Ertio Lew
Provide an option to sort columns by timestamp i.e, in the order they have
been added to the row, with the facility to use any column names.

On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis  wrote:

> Hi all,
>
> Two years ago I asked for Cassandra use cases and feature requests.
> [1]  The results [2] have been extremely useful in setting and
> prioritizing goals for Cassandra development.  But with the release of
> 1.0 we've accomplished basically everything from our original wish
> list. [3]
>
> I'd love to hear from modern Cassandra users again, especially if
> you're usually a quiet lurker.  What does Cassandra do well?  What are
> your pain points?  What's your feature wish list?
>
> As before, if you're in stealth mode or don't want to say anything in
> public, feel free to reply to me privately and I will keep it off the
> record.
>
> [1]
> http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
> [2]
> http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
> [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-10-31 Thread Ertio Lew
Thanks so much SebWajam  for this great piece of work!

Is there a way to set a data type for displaying the column names/ values
of a CF ? It seems that your project always uses String Serializer for
any piece of data however most of the times in real world cases this is not
true so can we anyhow configure what serializer to use while reading the
data so that the data may be properly identified by your project &
delivered in a readable format ?

On Mon, Aug 22, 2011 at 7:17 AM, SebWajam  wrote:

> Hi,
>
> I'm working on this project for a few months now and I think it's mature
> enough to post it here:
> Cassandra Cluster Admin on 
> GitHub
>
> Basically, it's a GUI for Cassandra. If you're like me and used MySQL for
> a while (and still using it!), you get used to phpMyAdmin and its simple
> and easy to use user interface. I thought it would be nice to have a
> similar tool for Cassandra and I couldn't find any, so I build my own!
>
> Supported actions:
>
>- Keyspace manipulation (add/edit/drop)
>- Column Family manipulation (add/edit/truncate/drop)
>- Row manipulation on column family and super column family
>(insert/edit/remove)
>- Basic data browser to navigate in the data of a column family (seems
>to be the favorite feature so far)
>- Support Cassandra 0.8+ atomic counters
>- Support management of multiple Cassandra clusters
>
> Bug report and/or pull request are always welcome!
>
> --
> View this message in context: Cassandra Cluster Admin - phpMyAdmin for
> Cassandra
> Sent from the cassandra-u...@incubator.apache.org mailing list 
> archiveat 
> Nabble.com.
>


Re: Newbie question - fetching multiple columns of different datatypes and conversion from byte[]

2011-10-31 Thread Ertio Lew
Should the different datatype col values or names be first read as byte 
buffer & then converted to appropriate type using Hector's provided 
serializers api like the way shown below ?

ByteBuffer bb;
..

String s= StringSerializer.get().fromByteBuffer(bb);


Or are there any better ways ?


ByteBuffer as an initial serializer to read columns with mixed datatypes ?

2011-10-30 Thread Ertio Lew
I have a mix of byte[] & Integer column names/ values within a CF rows. So
should ByteBuffer be my initial choice for the serializer while making the
read query to the database for the mixed datatypes & then I should retrieve
the byte[] or Integer from ByteBuffer using the ByteBuffer api's getInt()
method ?

Is this a preferable way to read columns with integer/
byte[] names, initially as bytebuffer(s) & later converting them to Integer
or byte[] ?


Re: Authentication setup

2011-10-22 Thread Ertio Lew
Hey,

I'm too looking out for a similar thing. I guess this is a very common
requirement & may be soon provided as built-in functionality packed with
cassandra setup.

Btw nice to see if someone has ideas about how to implement this for now.




On Fri, Oct 21, 2011 at 6:53 PM, Alexander Konotop <
alexander.kono...@gmail.com> wrote:

> Hello :-)
> Does anyone have a working config with normal secure authentication?
> I've just installed Cassandra 1.0.0 and see that SimpleAuthenticate is
> meant to be non-secure and was moved to examples. I need a production
> config - so I've tried to write this to config:
> 
> authenticator: org.apache.cassandra.auth.AuthenticatedUser
> authority: org.apache.cassandra.auth.AuthenticatedUser
> 
> But during cassandra startup log says:
> 
> org.apache.cassandra.config.ConfigurationException: No default
> constructor for authenticator class
> 'org.apache.cassandra.auth.AuthenticatedUser'.
> 
>
> As I understand either AuthenticatedUser is a wrong class or I simply
> don't know how to set it up - does it need additional configs similar to
> access.properties or passwd.properties? Maybe there's a way to store
> users in cassandra DB itself, like, fore example, MySQL does?
>
> I've searched and tried lot of things the whole day but the only info
> that I found were two phrases - first told that SimpleAuth is just a
> toy and second told to look into source to look for more auth methods.
> But, for example, this:
> 
> package org.apache.cassandra.auth;
>
> import java.util.Collections;
> import java.util.Set;
>
> /**
>  * An authenticated user and her groups.
>  */
> public class AuthenticatedUser
> {
>public final String username;
>public final Set groups;
>
>public AuthenticatedUser(String username)
>{
>this.username = username;
>this.groups = Collections.emptySet();
>}
>
>public AuthenticatedUser(String username, Set groups)
>{
>this.username = username;
>this.groups = Collections.unmodifiableSet(groups);
>}
>
>@Override
>public String toString()
>{
>return String.format("#", username, groups);
>}
> }
> 
> tells me just about nothing :-(
>
> Best regards
> Alexander
>


Using counters in 0.8

2011-05-18 Thread Ertio Lew
I am using Hector for a project & wanted to try out using counters with
latest 0.8 v Cassandra.

How do we work with counters in 0.8 version ? Any web-links to such examples
are appreciated.
Has Hector started to provide API for that ?


Columns values(integer) need frequent updates/ increments

2011-04-07 Thread Ertio Lew
Hi,

I am working on a Question/Answers web app using Cassandra(consider very
similar to StackOverflow sites). I need to built the reputation system for
users on the application. This way the user's reputation increases when s/he
answers correctly somebody's question. Thus if I keep the reputation score
of users as column values, these columns are very very frequently updated.
Thus I have several versions of a single column which I guess is very bad.

Similarly for the questions as well, the no of up-votes will increase very
very frequently and  hence again I'll get several versions of same column.

How should I try to minimize this ill effect?

** What I thought of..
Try using a separate CF for reputation system, so that the memtable stores
most of the columns(containing reputation scores of the users). Thus
frequent updates will update the column in the memtable, which means more
easier reads as well as updates. These reputations columns are anyways small
& do not explode in numbers(they only replace another column).


Is it possible to get just a count of the no of columns in a row, in efficient manner ?

2011-03-13 Thread Ertio Lew
Can I get just a count of the no of columns in a row without
deserializing all columns in row? Or should the usage of a counter
column be preferred that maintains the no of columns currently present
in the row, for the situations when the total count value is most
frequently used than reading the actual columns ?


Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-28 Thread Ertio Lew
On Tue, Mar 1, 2011 at 1:26 AM, Aaron Morton  wrote:
> This is mostly from memory. But the last 12 ? (4096 decimal) bits are a 
> counter for the number of id's generated in a particular millisecond for that 
> server. You could use the high 4 bits in that range for your data type flags 
> and the low 8 for the counter.

So then I would be able to generate a maximum of upto 256 Ids per
millisecond (or 256000 per second) on one machine!? Seems like a very
good limit for my use case. I dont think I would ever need beyond that
since my write volumes are quite below as compared to that limit..
Should I go for it or still are there any other things to consider ?

>
> Aaron
>
> On 1/03/2011, at 4:41 AM, Ertio Lew  wrote:
>
>> Hi Ryan,
>>
>> I am considering snowflake as an option for my usage with Cassandra
>> for a distributed application.
>> As I came to know snowflake uses 64 bits IDs. I am looking for a
>> solution that could help me generate 64 bits Ids
>> but in those 64 bits I would like at least 4 free bits so that I could
>> manipulate with those free bits to distinguish the two rows for a same
>> entity(split by kind of data) in same column family.
>>
>> If I could keep the snowflake's Id size to around 60 bits, that would
>> be great for my use case. Is it possible to manipulate the bits safely
>> to around 60 bits? Perhaps the microsecond precision is not required
>> to that much depth for my use case.
>>
>> Any kind of suggestions would be appreciated.
>>
>> Best Regards
>> Ertio Lew
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 4, 2011 at 1:09 AM, Ryan King  wrote:
>>> You could also consider snowflake:
>>>
>>> http://github.com/twitter/snowflake
>>>
>>> which gives you ids that roughly sort by time (but aren't sequential).
>>>
>>> -ryan
>>>
>>> On Thu, Feb 3, 2011 at 11:13 AM, Matthew E. Kennedy
>>>  wrote:
>>>> Unless you need your user identifiers to be sequential for some reason, I 
>>>> would save yourself the headache of this kind of complexity and just use 
>>>> UUIDs if you have to generate an identifier.
>>>>
>>>> On Feb 3, 2011, at 2:03 PM, Aklin_81 wrote:
>>>>
>>>>> Hi all,
>>>>> To generate new keys/ UserIds for new users on my application, I am
>>>>> thinking of using a simple synchronized counter that can keep track of
>>>>> the no. of users registered on my application and when a new user
>>>>> signs up, he can be allotted the next available id.
>>>>>
>>>>> Since Cassandra is eventually consistent, Is this advisable to
>>>>> implement with Cassandra, but then I could also use stronger
>>>>> consistency level like quorum or all for this purpose.
>>>>>
>>>>>
>>>>> Please let me know your thoughts and suggesttions..
>>>>>
>>>>> Regards
>>>>> Asil
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> @rk
>>>
>


Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-28 Thread Ertio Lew
Hi Ryan,

I am considering snowflake as an option for my usage with Cassandra
for a distributed application.
As I came to know snowflake uses 64 bits IDs. I am looking for a
solution that could help me generate 64 bits Ids
but in those 64 bits I would like at least 4 free bits so that I could
manipulate with those free bits to distinguish the two rows for a same
entity(split by kind of data) in same column family.

If I could keep the snowflake's Id size to around 60 bits, that would
be great for my use case. Is it possible to manipulate the bits safely
to around 60 bits? Perhaps the microsecond precision is not required
to that much depth for my use case.

Any kind of suggestions would be appreciated.

Best Regards
Ertio Lew







On Fri, Feb 4, 2011 at 1:09 AM, Ryan King  wrote:
> You could also consider snowflake:
>
> http://github.com/twitter/snowflake
>
> which gives you ids that roughly sort by time (but aren't sequential).
>
> -ryan
>
> On Thu, Feb 3, 2011 at 11:13 AM, Matthew E. Kennedy
>  wrote:
>> Unless you need your user identifiers to be sequential for some reason, I 
>> would save yourself the headache of this kind of complexity and just use 
>> UUIDs if you have to generate an identifier.
>>
>> On Feb 3, 2011, at 2:03 PM, Aklin_81 wrote:
>>
>>> Hi all,
>>> To generate new keys/ UserIds for new users on my application, I am
>>> thinking of using a simple synchronized counter that can keep track of
>>> the no. of users registered on my application and when a new user
>>> signs up, he can be allotted the next available id.
>>>
>>> Since Cassandra is eventually consistent, Is this advisable to
>>> implement with Cassandra, but then I could also use stronger
>>> consistency level like quorum or all for this purpose.
>>>
>>>
>>> Please let me know your thoughts and suggesttions..
>>>
>>> Regards
>>> Asil
>>
>>
>
>
>
> --
> @rk
>


Re: Specifying row caching on per query basis ?

2011-02-09 Thread Ertio Lew
Is this under consideration for future releases ? or being thought about!?



On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis  wrote:
> Currently there is not.
>
> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew  wrote:
>> Is there any way to specify on per query basis(like we specify the
>> Consistency level), what rows be cached while you're reading them,
>> from a row_cache enabled CF. I believe, this could lead to much more
>> efficient use of the cache space!!( if you use same data for different
>> features/ parts in your application which have different caching
>> needs).
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Specifying row caching on per query basis ?

2011-02-09 Thread Ertio Lew
Is there any way to specify on per query basis(like we specify the
Consistency level), what rows be cached while you're reading them,
from a row_cache enabled CF. I believe, this could lead to much more
efficient use of the cache space!!( if you use same data for different
features/ parts in your application which have different caching
needs).


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-08 Thread Ertio Lew
Thanks for adding up Benjamin!

On Wed, Feb 9, 2011 at 1:40 AM, Benjamin Coverston
 wrote:
>
>
> On 2/4/11 11:58 PM, Ertio Lew wrote:
>>
>> Yes, a disadvantage of more no. of CF in terms of memory utilization
>> which I see is: -
>>
>> if some CF is written less often as compared to other CFs, then the
>> memtable would consume space in the memory until it is flushed, this
>> memory space could have been much better used by a CF that's heavily
>> written and read. And if you try to make the thresholds for flush
>> smaller then more compactions would be needed.
>>
>>
> One more disadvantage here is that with CFs that vary widely in the write
> rate you can also end up with fragmented commit logs which in some cases we
> have seen actually fill up the commit log partition. As a consequence one
> thing to consider would be to lower the commit log flush threshold (in
> minutes) to something lower for the column families that do not see heavy
> use.
>
>>
>>
>> On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew  wrote:
>>>
>>> Thanks Tyler !
>>>
>>> I could not fully understand the reason why more no of column families
>>> would mean more memory.. if you have under control parameters like
>>> memtable_throughput&  memtable_operations which are set per column
>>> family basis then you can directly control&  adjust by splitting the
>>> memory space between two CFs in proportion to what you would do in
>>> single CF.
>>> Hence there should be no extra memory consumption for multiple CFs
>>> that have been split from single one??
>>>
>>> Regarding the compactions, I think even if they are more the size of
>>> the SST files to be compacted is smaller as the data has been split
>>> into two.
>>> Then more compactions but smaller too!!
>>>
>>>
>>> Then, provided the same amount of data, how can greater no of column
>>> families could be a bad option(if you split the values of parameters
>>> for memory consumption proportionately) ??
>>>
>>> --
>>> Regards,
>>> Ertio
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs  wrote:
>>>>>
>>>>> I read somewhere that more no of column families is not a good idea as
>>>>> it consumes more memory and more compactions to occur
>>>>
>>>> This is primarily true, but not in every case.
>>>>
>>>>> But the caching requirements may be different as they cater to two
>>>>> different features.
>>>>
>>>> This is a great reason to *not* merge them.  Besides the key and row
>>>> caches,
>>>> don't forget about the OS buffer cache.
>>>>
>>>>> Is it recommended to merge these two column families into one ??
>>>>> Thoughts
>>>>> ?
>>>>
>>>> No, this sounds like an anti-pattern to me.  The overhead from having
>>>> two
>>>> separate CFs is not that high.
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> Software Engineer, DataStax
>>>> Maintainer of the pycassa Cassandra Python client library
>>>>
>>>>
>


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-05 Thread Ertio Lew
Thanks Tyler!

I think I'll have to very carefully take into consideration all these
factors before deciding upon how to split my data into CFs, as this
cannot an objective answer. I am expecting around atleast 8 column
families for my entire application, if I split the data strictly
according to the various features and requirements of the application.

I think there should have been provision for specifying on per query
basis, what rows be cached while you're reading them, from a row_cache
enabled CF. Thus you could easily merge similar data for different
features of your application in a single CF. I believe, this would
have also lead to much more efficient use of the cache space!!( if you
were using same data for different parts in your app which have
different caching needs)

Regards,

Ertio

On Sun, Feb 6, 2011 at 1:22 AM, Tyler Hobbs  wrote:
>> if you have under control parameters like
>> memtable_throughput & memtable_operations which are set per column
>> family basis then you can directly control & adjust by splitting the
>> memory space between two CFs in proportion to what you would do in
>> single CF.
>> Hence there should be no extra memory consumption for multiple CFs
>> that have been split from single one??
>
> Yes, I think you have the right idea here.  This is a small amount of
> overhead for the extra memtable and keeping track of a second set of
> indexes, bloom filters, sstables, etc.
>
>> Regarding the compactions, I think even if they are more the size of
>> the SST files to be compacted is smaller as the data has been split
>> into two.
>> Then more compactions but smaller too!!
>
> Yes.
>
>> if some CF is written less often as compared to other CFs, then the
>> memtable would consume space in the memory until it is flushed, this
>> memory space could have been much better used by a CF that's heavily
>> written and read. And if you try to make the thresholds for flush
>> smaller then more compactions would be needed.
>
> If you merge the two CFs together, then updates to the 'less freqent' rows
> will still consume memory, only it will all be within one memtable.
> (Memtables grow in size until they are flushed, they don't reserve some set
> amount of memory.)  Furthermore, because your memtables will be filled up by
> the 'more frequent' rows, the 'less frequent' rows will get fewer
> updates/overwrites in memory, so they will tend to be spread across a
> greater number of SSTables.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>
>


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Yes, a disadvantage of more no. of CF in terms of memory utilization
which I see is: -

if some CF is written less often as compared to other CFs, then the
memtable would consume space in the memory until it is flushed, this
memory space could have been much better used by a CF that's heavily
written and read. And if you try to make the thresholds for flush
smaller then more compactions would be needed.





On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew  wrote:
> Thanks Tyler !
>
> I could not fully understand the reason why more no of column families
> would mean more memory.. if you have under control parameters like
> memtable_throughput & memtable_operations which are set per column
> family basis then you can directly control & adjust by splitting the
> memory space between two CFs in proportion to what you would do in
> single CF.
> Hence there should be no extra memory consumption for multiple CFs
> that have been split from single one??
>
> Regarding the compactions, I think even if they are more the size of
> the SST files to be compacted is smaller as the data has been split
> into two.
> Then more compactions but smaller too!!
>
>
> Then, provided the same amount of data, how can greater no of column
> families could be a bad option(if you split the values of parameters
> for memory consumption proportionately) ??
>
> --
> Regards,
> Ertio
>
>
>
>
>
> On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs  wrote:
>>
>>> I read somewhere that more no of column families is not a good idea as
>>> it consumes more memory and more compactions to occur
>>
>> This is primarily true, but not in every case.
>>
>>> But the caching requirements may be different as they cater to two
>>> different features.
>>
>> This is a great reason to *not* merge them.  Besides the key and row caches,
>> don't forget about the OS buffer cache.
>>
>>> Is it recommended to merge these two column families into one ?? Thoughts
>>> ?
>>
>> No, this sounds like an anti-pattern to me.  The overhead from having two
>> separate CFs is not that high.
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax
>> Maintainer of the pycassa Cassandra Python client library
>>
>>
>


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Thanks Tyler !

I could not fully understand the reason why more no of column families
would mean more memory.. if you have under control parameters like
memtable_throughput & memtable_operations which are set per column
family basis then you can directly control & adjust by splitting the
memory space between two CFs in proportion to what you would do in
single CF.
Hence there should be no extra memory consumption for multiple CFs
that have been split from single one??

Regarding the compactions, I think even if they are more the size of
the SST files to be compacted is smaller as the data has been split
into two.
Then more compactions but smaller too!!


Then, provided the same amount of data, how can greater no of column
families could be a bad option(if you split the values of parameters
for memory consumption proportionately) ??

--
Regards,
Ertio





On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs  wrote:
>
>> I read somewhere that more no of column families is not a good idea as
>> it consumes more memory and more compactions to occur
>
> This is primarily true, but not in every case.
>
>> But the caching requirements may be different as they cater to two
>> different features.
>
> This is a great reason to *not* merge them.  Besides the key and row caches,
> don't forget about the OS buffer cache.
>
>> Is it recommended to merge these two column families into one ?? Thoughts
>> ?
>
> No, this sounds like an anti-pattern to me.  The overhead from having two
> separate CFs is not that high.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>
>


Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
I read somewhere that more no of column families is not a good idea as
it consumes more memory and more compactions to occur & thus I am
trying to reduce the no. of column families by adding the rows of
other Column families(with similar attributes) as separate rows into
one.

I have two kinds of data for two separate features on my application.
If I store them in two different column families then both of them
will have similar attributes like same comparator type & sorting
needs. Thus I can also merge both of them in one column family, just
by adding the rows of another to this one(increasing the no of rows).
However some rows of 1st kind of data are very frequently used and
rows of 2nd data are less freq. used. But I dont think this will be a
problem as I am not merging two rows into one, but just adding them as
separate rows in the column family.
1st kind of data has wider rows and 2nd kind of data has very less wide rows.

But the caching requirements may be different as they cater to two
different features.(but I think it is even advantageous since
resources are free to be utilized by any data that's more frequently
used)


Is it recommended to merge these two column families into one ?? Thoughts ?

--

Ertio


Re: Can a same key exists for two rows in two different column families without clashing ?

2011-02-02 Thread Ertio Lew
Thanks Stephen for the Great Explanation!



On Wed, Feb 2, 2011 at 4:31 PM, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

> On 2 February 2011 10:03, Ertio Lew  wrote:
> > Can a same key exists for two rows in two different column families
> without
> > clashing ?  Other words, does the same algorithm needs to enforced for
> > generating keys for different column families or can different
> > algorithms(for generating keys) be enforced on column family basis?
> >
> > I have tried out that they can, but I wanted to know if there may be any
> > problems associated with this.
> >
> > Thanks.
> > Ertio Lew
> >
>
> it is a bad analogy for many reasons but if you replace "row key" with
> "primary key" and "column family" with "table" then you might get an
> answer.
>
> a better analogy is to think of the following.
>
> public class Keyspace {
>
>  public final Map> columnFamily1;
>
>  public final Map> columnFamily2;
>
>  public final Map>>
> superColumnFamily3;
>
> }
>
> (still not quite correct, but mostly so for our purposes);
>
> you are asking given
>
> Keyspace keyspace;
> String key1 = makeKeyAlg1();
> keyspace.columnFamily1.put(key1,...);
>
> String key2 = makeKeyAlg2();
> keyspace.columnFamily2.put(key2,...);
>
> when key1.equals(key2)
>
> then is there a problem?
>
> They are two separate maps... why would there be.
>
> -Stephen
>


Can a same key exists for two rows in two different column families without clashing ?

2011-02-02 Thread Ertio Lew
Can a same key exists for two rows in two different column families without
clashing ?  Other words, does the same algorithm needs to enforced for
generating keys for different column families or can different
algorithms(for generating keys) be enforced on column family basis?

I have tried out that they can, but I wanted to know if there may be any
problems associated with this.

Thanks.
Ertio Lew


Re: Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

2011-01-29 Thread Ertio Lew
Could someone please point me in right direction by commenting on the above
ideas ?

On Fri, Jan 28, 2011 at 11:50 PM, Ertio Lew  wrote:

> Hi,
>
> I have two kinds of data that I would like to fit in one super column
> family; I am trying this, for the reasons of implementing fast
> database retrievals by combining the data of two rows into just one
> row.
>
> First kind of data, in supercolumn family, is named with timeUUIDs as
> supercolumn names; Think of this as, the postIds of posts in a Group.
> These posts will need to be sorted by time (so that list of latest
> posts is retrieved). Thus each post has one supercolumn each with name
> as (timeUUID+userID) and sorted by timeUUIDtype.
>
> Second kind of data would be just a single supercolumn containing
> columns of userId of all members in a group(very small). (The no of
> members in group will be around 40-50 max). The name of this single
> supercolumn may be kept suitable(perhaps max. time in future ) so as
> to keep this supercolumn to the beginning.
>
> (The supercolumns are required as we need to store some additional
> data in the columns of 1st kind of data).
>
> So is it recommended to store these two types of data (not related to
> each other but need to be retrieved together) in one super column
> family ?
>


Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

2011-01-28 Thread Ertio Lew
Hi,

I have two kinds of data that I would like to fit in one super column
family; I am trying this, for the reasons of implementing fast
database retrievals by combining the data of two rows into just one
row.

First kind of data, in supercolumn family, is named with timeUUIDs as
supercolumn names; Think of this as, the postIds of posts in a Group.
These posts will need to be sorted by time (so that list of latest
posts is retrieved). Thus each post has one supercolumn each with name
as (timeUUID+userID) and sorted by timeUUIDtype.

Second kind of data would be just a single supercolumn containing
columns of userId of all members in a group(very small). (The no of
members in group will be around 40-50 max). The name of this single
supercolumn may be kept suitable(perhaps max. time in future ) so as
to keep this supercolumn to the beginning.

(The supercolumns are required as we need to store some additional
data in the columns of 1st kind of data).

So is it recommended to store these two types of data (not related to
each other but need to be retrieved together) in one super column
family ?


Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-18 Thread Ertio Lew
I think we might need to go with full Java implementation only, in
that case, to live up with Hector as we do not find any other better
option.

@Dave: Thanks for the links but we wouldn't much prefer to go with
thrift implementation because of frequently changing api and other
complexities there.

Also we would not like to lock ourselves with implementation in a
language with a client option that has limitations that we can bear
now but not necessarily in future.

If anybody else has a better solution to this please let me know.

Thank you all.
Ertio Lew


On Tue, Jan 18, 2011 at 2:49 PM, Dave Gardner  wrote:
> I can't comment of phpcassa directly, but we use Cassandra plus PHP in
> production without any difficulties. We are happy with the
> performance.
>
> Most of the information we needed to get started we found here:
>
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
>
> This includes details on how to compile the native PHP C Extension for
> Thrift. We use a bespoke client which wraps the Thrift interface.
>
> You may be better of with a higher level client, although when we were
> starting out there was less of a push away from Thrift directly. I
> found using Thrift useful as you gain an appreciation for what calls
> Cassandra actually supports. One potential advantage of using a higher
> level client is that it may protect you from the frequent Thrift
> interface changes which currently seem to accompany every major
> release.
>
> Dave
>
>
>
>
> On Tuesday, 18 January 2011, Tyler Hobbs  wrote:
>>
>> 1. )  Is it devloped to the level in order to support all the
>> necessary features to take full advantage of Cassandra?
>>
>> Yes.  There aren't some of the niceties of pycassa yet, but you can do 
>> everything that Cassandra offers with it.
>>
>>
>> 2. )  Is it used in production by anyone ?
>>
>> Yes, I've talked to a few people at least who are using it in production.  
>> It tends to play a limited role instead of a central one, though.
>>
>>
>> 3. )  What are its limitations?
>>
>> Being written in PHP.  Seriously.  The lack of universal 64bit integer 
>> support can be problematic if you don't have a fully 64bit system.  PHP is 
>> fairly slow.  PHP makes a few other things less easy to do.  If you're doing 
>> some pretty lightweight interaction with Cassandra through PHP, these might 
>> not be a problem for you.
>>
>> - Tyler
>>
>>
>
> --
> *Dave Gardner*
> Technical Architect
>
> [image: imagini_58mmX15mm.png]   [image: VisualDNA-Logo-small.png]
>
> *Imagini Europe Limited*
> 7 Moor Street, London W1D 5NB
>
> [image: phone_icon.png] +44 20 7734 7033
> [image: skype_icon.png] daveg79
> [image: emailIcon.png] dave.gard...@imagini.net
> [image: icon-web.png] http://www.visualdna.com
>
> Imagini Europe Limited, Company number 5565112 (England
> and Wales), Registered address: c/o Bird & Bird,
> 90 Fetter Lane, London, EC4A 1EQ, United Kingdom
>


What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Ertio Lew
What would be the best client option to go with in order to use
Cassandra through an application to be implemented in PHP.

It seems that PHP developers have a high barrier of entry to
Cassandra's world because of the unavailability of relatively mature,
developed and well proven client options (like Hector for Java
developers) that fits the requirements & provide features for
production environments.

In this case what would be the best option to go with in order to use
Cassandra for production ? Implementing in some different language
like java ? or using thrift library?

I know most of the Cassandra implementations are Java based, so is
that way be preferable ? It's not very easy to go with Java based
application for small companies with less manpower.

I am unable to make an easy decision. Please help me out to make a
more performance-centered decision for the application.

Thanks.
Ertio Lew


P.S. In any case, if you suggest a client option please list any major
implementations of that.


Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Ertio Lew
what is the technology stack do you use?

On 1/14/11, Ran Tavory  wrote:
> I use Hector,  if that counts. ..
> On Jan 14, 2011 7:25 PM, "Ertio Lew"  wrote:
>> Hey,
>>
>> If you have a site in production environment or considering so, what
>> is the client that you use to interact with Cassandra. I know that
>> there are several clients available out there according to the
>> language you use but I would love to know what clients are being used
>> widely in production environments and are best to work with(support
>> most required features for performance).
>>
>> Also preferably tell about the technology stack for your applications.
>>
>> Any suggestions, comments appreciated ?
>>
>> Thanks
>> Ertio
>


Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Ertio Lew
Hey,

If you have a site in production environment or considering so, what
is the client that you use to interact with Cassandra. I know that
there are several clients available out there according to the
language you use but I would love to know what clients are being used
widely in production environments and are best to work with(support
most required features for performance).

Also preferably tell about the technology stack for your applications.

Any suggestions, comments appreciated ?

Thanks
Ertio


Are you using Phpcassa for any application currently in production? or considering so ?

2011-01-13 Thread Ertio Lew
I need to choose one amongst several client options to work with
Cassandra for a serious web application for production environments. I
prefer to work with php but I am not sure what if phpcassa would be
best choice if I am open to working with other other languages as
well.

Php developers normally are in huge majority everywhere but I rather
found a bit difficult to see the majority here. Do you have a setup in
production or are you considering so ?