Re: Increasing thrift_framed_transport_size_in_mb

2011-09-23 Thread Jonathan Ellis
Really large messages are not encouraged because they will fragment
your heap quickly.  Other than that, no.

On Fri, Sep 23, 2011 at 3:40 PM, Sanjeev Kulkarni  wrote:
> Hey guys,
> Are there any side-effects of increasing
> the thrift_framed_transport_size_in_mb and thrift_max_message_length_in_mb
> variables from their default values to something like 100mb?
> Thanks!



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: ColumnFamily per Index?

2011-09-23 Thread Edward Capriolo
On Fri, Sep 23, 2011 at 5:28 PM, Ron Siemens wrote:

>
> I have a column family for my main data, and I have been using an
> additional column family to store indexes to the data: row per index style.
>
> I now want to be able to update a set of indexes by the field being indexed
> on.  To access that set, I can maintain meta indexes for each field, or I
> can switch from RP to OPP and do range queries on field-prefixed index
> names.  Both options seem less than ideal.  The first is annoying extra
> maintenance, and the second means I may open a can of worms with
> load-balancing.
>
> I concluded another option might be better with neither of those drawbacks.
>  I can just create a ColumnFamily per field being indexed.  I can now easily
> access and update the indexes for a particular field.
>
> I'm wondering if anyone has also contemplated this
> Column-Family-per-Index-Field option or is using it, and has any thoughts or
> critique regarding it.
>
> Ron
>


I am doing column family per index with casbase
https://github.com/edwardcapriolo/casbase. However if you look into
dynamic-composite columns the use case they were made for is to store
multiple indexes of different types in the same column family.


ColumnFamily per Index?

2011-09-23 Thread Ron Siemens

I have a column family for my main data, and I have been using an additional 
column family to store indexes to the data: row per index style.

I now want to be able to update a set of indexes by the field being indexed on. 
 To access that set, I can maintain meta indexes for each field, or I can 
switch from RP to OPP and do range queries on field-prefixed index names.  Both 
options seem less than ideal.  The first is annoying extra maintenance, and the 
second means I may open a can of worms with load-balancing.

I concluded another option might be better with neither of those drawbacks.  I 
can just create a ColumnFamily per field being indexed.  I can now easily 
access and update the indexes for a particular field.

I'm wondering if anyone has also contemplated this 
Column-Family-per-Index-Field option or is using it, and has any thoughts or 
critique regarding it.

Ron
 

Re: Storing (python) objects

2011-09-23 Thread Edward Capriolo
On Fri, Sep 23, 2011 at 1:41 PM, Ian Danforth  wrote:

> Good feedback from all. Thanks!
>
> Ian
>
> On Fri, Sep 23, 2011 at 7:48 AM, Tristan Seligmann <
> mithra...@mithrandi.net> wrote:
>
>> On Fri, Sep 23, 2011 at 1:09 AM, Alexis Lê-Quôc 
>> wrote:
>> > For data accessed through a single path, I use the same trick: pickle,
>> bz2
>> > and insert.
>>
>> Note that unpickling a pickle in Python involves a) arbitrary code
>> execution, and b) relies on your code being the same (or close enough)
>> to what it was when the pickle was created, so it is generally a very
>> bad choice for persistent data serialization.
>> --
>> mithrandi, i Ainil en-Balandor, a faer Ambar
>>
>
>
I am working on something similar
https://github.com/edwardcapriolo/Cassandra-AnyType one of the features I
want to get at is being able to serialize any comparable object to json
using google gson. Doing this will allow storage of any Java object as json,
and the fields should sort by the same rules as compare to. (still a work in
progress)


Increasing thrift_framed_transport_size_in_mb

2011-09-23 Thread Sanjeev Kulkarni
Hey guys,
Are there any side-effects of increasing
the thrift_framed_transport_size_in_mb and thrift_max_message_length_in_mb
variables from their default values to something like 100mb?
Thanks!


Re: [VOTE] Release Mojo's Cassandra Maven Plugin 0.8.6-1

2011-09-23 Thread Stephen Connolly
This vote has passed:

+1: Me, Colin & Nate
0:
-1:

I will proceed with the release

-Stephen

On 20 September 2011 15:27, Stephen Connolly
 wrote:
> Hi,
>
> I'd like to release version 0.8.6-1 of Mojo's Cassandra Maven Plugin
> to sync up with the recent 0.8.6 release of Apache Cassandra.
>
>
> We solved 2 issues:
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17425
>
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-010/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-0.8.6-1@14748
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for 72 hours and will succeed by lazy consensus.
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, this vote is
> also open to subscribers of the dev and user@cassandra.apache.org
> mailing lists
>


Re: MessagingService.sendOneWay sending blank bytes?

2011-09-23 Thread Jonathan Ellis
The full backport is beyond the scope of what I'm comfortable in a
stable release series, but the asByteArray fix sounds reasonable to
me.  Can you create a ticket + patch?

On Fri, Sep 23, 2011 at 8:04 AM, Greg Hinkle  wrote:
> Is it worth a back-port or at least switch to asByteArray for 0.8.7? It's a 
> sizable amount of wasted network traffic and the fix seems pretty safe. (It's 
> working for me)
>
> Greg Hinkle
>
> On Sep 23, 2011, at 3:32 AM, Jonathan Ellis wrote:
>
>> Yes.  This is one of the things fixed for 1.0 in
>> https://issues.apache.org/jira/browse/CASSANDRA-1788
>>
>> On Thu, Sep 22, 2011 at 11:16 PM, Greg Hinkle  wrote:
>>> I noticed that on the 0.8 branch the implementation of 
>>> MessagingService.sendOneWay is building up a DataOutputBuffer with a 
>>> default size of 128 bytes, but then sending it as the full buffer no matter 
>>> how many bytes the the data takes. I believe it should be calling 
>>> DataOutputBuffer.asByteArray() or copying just up to the length() into the 
>>> ByteBuffer. This means it appears to be wasting on around 40 to 80 bytes on 
>>> every message. This really adds up in a big cluster.
>>>
>>> It looks like things are different in trunk, but can anyone confirm this 
>>> bug in 0.8? Thanks.
>>>
>>>
>>> Greg Hinkle
>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Storing (python) objects

2011-09-23 Thread Ian Danforth
Good feedback from all. Thanks!

Ian

On Fri, Sep 23, 2011 at 7:48 AM, Tristan Seligmann
wrote:

> On Fri, Sep 23, 2011 at 1:09 AM, Alexis Lê-Quôc  wrote:
> > For data accessed through a single path, I use the same trick: pickle,
> bz2
> > and insert.
>
> Note that unpickling a pickle in Python involves a) arbitrary code
> execution, and b) relies on your code being the same (or close enough)
> to what it was when the pickle was created, so it is generally a very
> bad choice for persistent data serialization.
> --
> mithrandi, i Ainil en-Balandor, a faer Ambar
>


Re: Bulk uploader issue on multi-node cluster

2011-09-23 Thread Benoit Perroud
On the sstableloader config, make sure you have the seed set and rpc_address
and rpc_port pointing to your cassandra instance (127.0.0.2)



2011/9/23 Thamizh 

> Hi All,
>
> I am using bulk-loading to upload data(from lab02) to multi-node cluster of
> 3 machines(lab02,lab03 & lab04) with sigle ethernet card. I have created
> SSTable instance on lab02 by duplicating look back address( sudo ifconfig
> lo:2 127.0.0.2 netmask 255.0.0.0 up; ) "127.0.0.2" as rpc and storage
> address. Here 'sstableloader' ended up with below error message,
>
> Starting client (and waiting 30 seconds for gossip) ...
> java.lang.IllegalStateException: Cannot load any sstable, no live member
> found in the cluster
>
> Here, in my case, Does lab02 machine should have 2 ethernet card(one for
> cassandra original instance and another for 'sstableloader') ?
>
> Regards,
> Thamizhannal
>


Bulk uploader issue on multi-node cluster

2011-09-23 Thread Thamizh
Hi All,

I am using bulk-loading to upload data(from lab02) to multi-node cluster of 3 
machines(lab02,lab03 & lab04) with sigle ethernet card. I have created SSTable 
instance on lab02 by duplicating look back address( sudo ifconfig lo:2 
127.0.0.2 netmask 255.0.0.0 up; ) "127.0.0.2" as rpc and storage address. Here 
'sstableloader' ended up with below error message,

Starting client (and waiting 30 seconds for gossip) ...
java.lang.IllegalStateException: Cannot load any sstable, no live member found 
in the cluster

Here, in my case, Does lab02 machine should have 2 ethernet card(one for 
cassandra original instance and another for 'sstableloader') ?

Regards,

  Thamizhannal


Re: Storing (python) objects

2011-09-23 Thread Tristan Seligmann
On Fri, Sep 23, 2011 at 1:09 AM, Alexis Lê-Quôc  wrote:
> For data accessed through a single path, I use the same trick: pickle, bz2
> and insert.

Note that unpickling a pickle in Python involves a) arbitrary code
execution, and b) relies on your code being the same (or close enough)
to what it was when the pickle was created, so it is generally a very
bad choice for persistent data serialization.
-- 
mithrandi, i Ainil en-Balandor, a faer Ambar


Re: Storing (python) objects

2011-09-23 Thread Koert Kuipers
i would advise not to use a language specific storage format, you might
regret it later on if you want to add an application to your system that is
written in anything else than python. i mean python is great, but it is not
necessary the right tool for every job

look at thrift/protobuf/avro/bson/json
i would use a serialization with an IDL

On Fri, Sep 23, 2011 at 5:07 AM, David Allsopp  wrote:

> We have done exactly as you describe (nested dicts etc) - works fine as
> long as you are happy to read the whole lump of data, i.e. don't need to
> read at a finer granularity. This approach can also save a lot of storage
> space as you don't have the overhead of many small columns.
>
> Some folks also write JSON, which would be a bit more language-independent
> of course.
>
>
> On 22 September 2011 19:28, Ian Danforth  wrote:
>
>> All,
>>
>>  I find myself considering storing serialized python dicts in Cassandra.
>> I'd like to store fairly complex, nested dicts, and it's just easier to do
>> this rather than work out a lot of super columns / columns etc.
>>
>>  Do others find themselves storing serialized data structures in Cassandra
>> or is this generally a sign of doing something wrong?
>>
>>  Thanks in advance!
>>
>> Ian
>>
>
>


Re: user Digest 23 Sep 2011 12:49:40 -0000 Issue 1371

2011-09-23 Thread David Semeria


user-digest-h...@cassandra.apache.org wrote:

>
>user Digest 23 Sep 2011 12:49:40 - Issue 1371
>
>Topics (messages 20995 through 21004):
>
>Re: shutdown by drain
>   20995 by: Radim Kolar
>   20998 by: Viktor Jevdokimov
>   21001 by: Sylvain Lebresne
>
>Re: How to enable JNA for Cassandra on Windows?
>   20996 by: Viktor Jevdokimov
>
>Re: Storing (python) objects
>   20997 by: David Allsopp
>
>Re: LevelDB type compaction
>   20999 by: Sam Overton
>
>please unsubscribe
>   21000 by: Vivek Mishra
>   21002 by: Jeremy Hanna
>
>Compression in v1.0
>   21003 by: David McNelis
>
>Re: Build Cassandra under Windows
>   21004 by: Jonathan Ellis
>
>Administrivia:
>
>-
>To post to the list, e-mail: user@cassandra.apache.org
>To unsubscribe, e-mail: user-digest-unsubscr...@cassandra.apache.org
>For additional commands, e-mail: user-digest-h...@cassandra.apache.org
>
>--
>


Re: MessagingService.sendOneWay sending blank bytes?

2011-09-23 Thread Greg Hinkle
Is it worth a back-port or at least switch to asByteArray for 0.8.7? It's a 
sizable amount of wasted network traffic and the fix seems pretty safe. (It's 
working for me)

Greg Hinkle

On Sep 23, 2011, at 3:32 AM, Jonathan Ellis wrote:

> Yes.  This is one of the things fixed for 1.0 in
> https://issues.apache.org/jira/browse/CASSANDRA-1788
> 
> On Thu, Sep 22, 2011 at 11:16 PM, Greg Hinkle  wrote:
>> I noticed that on the 0.8 branch the implementation of 
>> MessagingService.sendOneWay is building up a DataOutputBuffer with a default 
>> size of 128 bytes, but then sending it as the full buffer no matter how many 
>> bytes the the data takes. I believe it should be calling 
>> DataOutputBuffer.asByteArray() or copying just up to the length() into the 
>> ByteBuffer. This means it appears to be wasting on around 40 to 80 bytes on 
>> every message. This really adds up in a big cluster.
>> 
>> It looks like things are different in trunk, but can anyone confirm this bug 
>> in 0.8? Thanks.
>> 
>> 
>> Greg Hinkle
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



Re: How to enable JNA for Cassandra on Windows?

2011-09-23 Thread Jonathan Ellis
mmap is supported by the JDK, jna is not required.

On Fri, Sep 23, 2011 at 5:07 AM, Viktor Jevdokimov <
viktor.jevdoki...@adform.com> wrote:

> I found that there‘s no C library under Windows, and msvcrt does not
> provide mlockall function, so currently there‘s no way to use JNA under
> Windows. That way mmap is not a good idea?
>
> ** **
>
> ** **
> **
>
> ** **
>
> Best regards/ Pagarbiai
>
> ** **
>
> *Viktor Jevdokimov*
>
> Senior Developer
>
> ** **
>
> Email:  viktor.jevdoki...@adform.com
>
> Phone: +370 5 212 3063. Fax: +370 5 261 0453
>
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>
> ** **
>
> ** **
>
> [image: Adform news] 
>
> [image: Visit us!]
>
>   Follow:
>
> [image: twitter] 
>
> Visit our blog 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
> *From:* Viktor Jevdokimov [mailto:vjevdoki...@gmail.com]
> *Sent:* Thursday, September 22, 2011 15:01
> *To:* user@cassandra.apache.org
> *Subject:* How to enable JNA for Cassandra on Windows?
>
> ** **
>
> Hi,
>
> ** **
>
> I'm trying without success to enable JNA for Cassandra on Windows.
>
> ** **
>
> Tried to place JNA 3.3.0 libs jna.jar and platform.jar into Cassandra 0.8.6
> lib dir, but getting in log:
>
> Unable to link C library. Native methods will be disabled.
>
> ** **
>
> What is missed or what is wrong?
>
> ** **
>
> One thing I've found on inet about JNA and Windows is this sample:
>
> ** **
>
> // Library is c for unix and msvcrt for windows
>
> String libName = "c";
> if (System.getProperty("os.name").contains("Windows"))
> {
>   libName = "msvcrt";
>
> 
>
> }
>
> 
>
> // Loading dynamically the library
> CInterface demo = (CInterface) Native.loadLibrary(libName, CInterface.class); 
> 
>
> ** **
>
> from http://www.scriptol.com/programming/jna.php
>
> ** **
>
> while in Cassandra:
>
> ** **
>
> try
>
> {
>
> Native.register("c");
>
> }
>
> catch (NoClassDefFoundError e)
>
> {
>
> logger.info("JNA not found. Native methods will be disabled.");
>
> }
>
> catch (UnsatisfiedLinkError e)
>
> {
>
> logger.info("Unable to link C library. Native methods will be 
> disabled.");
>
> }
>
> catch (NoSuchMethodError e)
>
> {
>
> logger.warn("Obsolete version of JNA present; unable to register C 
> library. Upgrade to JNA 3.2.7 or later");
>
> }
>
> ** **
>
> Is it true that for Windows Cassandra should do something like:
>
> ** **
>
> if (System.getProperty("os.name").contains("Windows"))
>
> {
> Native.register("msvcrt");
> }
>
> else
>
> {
>
> Native.register("c");
> }
>
> ** **
>
> ** **
>
> Thanks
>
> Viktor
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
<><><>

Re: Build Cassandra under Windows

2011-09-23 Thread Jonathan Ellis
What was the target you were using that didn't work?

On Fri, Sep 23, 2011 at 5:00 AM, Viktor Jevdokimov <
viktor.jevdoki...@adform.com> wrote:

> Spolved – just used appropriate ant‘s targets to get jars built.
>
> ** **
> **
>
> ** **
>
> Best regards/ Pagarbiai
>
> ** **
>
> *Viktor Jevdokimov*
>
> Senior Developer
>
> ** **
>
> Email:  viktor.jevdoki...@adform.com
>
> Phone: +370 5 212 3063. Fax: +370 5 261 0453
>
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>
> ** **
>
> ** **
>
> [image: Adform news] 
>
> [image: Visit us!]
>
>   Follow:
>
> [image: twitter] 
>
> Visit our blog 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
> *From:* Viktor Jevdokimov [mailto:vjevdoki...@gmail.com]
> *Sent:* Friday, September 23, 2011 10:02
> *To:* user@cassandra.apache.org
> *Subject:* Build Cassandra under Windows
>
> ** **
>
> Hello,
>
> ** **
>
> I'm trying to build Cassandra 0.8 and 1.0.0 branches with no success on
> Windows, getting errors:
>
> ** **
>
> ...
>
> maven-ant-tasks-retrieve-build:
>
> [artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from
> repository central at http://repo1.maven.org/maven2
>
> [artifact:dependencies] Unable to locate resource in repository
>
> [artifact:dependencies] [INFO] Unable to find resource
> 'asm:asm:java-source:sources:3.2' in repository central (
> http://repo1.maven.org/maven2)
>
> [artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from
> repository apache at
> https://repository.apache.org/content/repositories/releases
>
> [artifact:dependencies] Unable to locate resource in repository
>
> ...
>
> and so on.
>
> ** **
>
> I have checked build/build-dependencies.xml and all files referenced are
> downloaded to local maven repository (${user.home}/.m2/repository)
> successfully.
>
> ** **
>
> Environment:
>
> Windows 7 Professional x64
>
> Ant 1.8.2
>
> JDK 1.6.0 b27
>
> ** **
>
> I'm a .NET developer with no experience building JAVA projects with ant.**
> **
>
> ** **
>
> What have I missed?
>
> ** **
>
> ** **
>
> Thanks,
>
> Viktor
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
<><><>

Compression in v1.0

2011-09-23 Thread David McNelis
I just read through the DataStax compression post (
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression).

My question is around good use cases for enabling compression.  In my
scenario I have very wide rows with many thousands of columns where its
essentially time-series information where the column names are all utc
timestamps.  There is significant overlap between rows, but there is wide
variance at times between the # of columns in each row.  I would imagine
(based on what I'd read) that because there are a significant number of
columns in common that compression would be useful.  At the same time, there
is a lot of variance, and right now there are more columns than rows (though
that will change over time), and that suggests compression might not be
useful.

Does anyone have any thoughts on what would make the most sense?  It would
be awesome if we could cut our storage needs consistently.

-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: shutdown by drain

2011-09-23 Thread Sylvain Lebresne
On Fri, Sep 23, 2011 at 11:04 AM, Radim Kolar  wrote:
> Dne 10.9.2011 21:48, Chris Goffinet napsal(a):
>>
>> For things like rolling restarts, we do:
>>
>> disablethrift
>> disablegossip
>> (...wait for all nodes to see this node go down..)
>> drain
>
> I discovered problem with this advice.
>
> If i do nodetool drain before killing node nodetool returns just after flush
> and stuff disabling is finished on cassandra node. But flush can trigger
> possible compaction and if you kill node after drain it will interrupt
> compaction in progress resulting in wasted disk space. I am not sure if tmp
> files are cleaned on cassandra start.

While it's true that flush/drain may trigger compactions and killing the node
just after them will interrupt those compaction, it's not really a problem of
flush or drain, in that you always have the risk of interrupting a compaction
when killing a node (i.e, even if drain don't happen to trigger a compaction,
there may have a compaction that started before the drain and are not yet
finished when you kill the node). Besides, if you kill the node just after the
drain, the compaction that it has trigger are probably not very advanced yet
so are probably the compaction that are the less wasteful to interrupt.

Moreover, yes, tmp files are cleaned on restart. So for a rolling restart,
which was the case Chris was talking about, this can hardly be called
wasted space.

Now it is true that it could be a shame to interrupt a compaction that have
been running for a long time and is about to finish (so typically not one that
has just been triggered by your drain), but you can always check the
compaction manager in JMX to see if it's the case before killing the node.

--
Sylvain


Re: LevelDB type compaction

2011-09-23 Thread Sam Overton
On 17 September 2011 00:58, mcasandra  wrote:

> >
> >> and updates could be scattered all over
> >> before compaction?
> >
> > No, updates to a given row will be still be in a single sstable.
>
> Can you please explain little more? You mean that if Level 1 file contains
> range from 1-100 all the updates would still go in that file?
>

No, sstables are never written to. All updates go to a memtable, which is
then flushed to a Level0 sstable. Compaction then causes promotion of
updates to gradually higher levels.


> The link on leveldb says:
>
> > The compaction picks a file from level L and all overlapping files from
> > the next level L+1
>
> If all updates go in the same sstables then how do overlapping files get
> generated. By overlapping I am assuming it means new or updated value for a
> given key exists in multiple files?
>

Overlapping refers to the range of keys. The key ranges contained within the
files in any given level represent a set of disjoint intervals (when level
>=1). For Level0, the files may contain overlapping ranges.


> Thanks for the explanation
>

Disclaimer: I haven't read the code, this is just my understanding from
reading the docs.


Sam Overton
Acunu | http://www.acunu.com | @acunu


RE: shutdown by drain

2011-09-23 Thread Viktor Jevdokimov
More of it, Cassandra 0.8.6 still leaves all commit logs under Windows.


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Radim Kolar [mailto:h...@sendmail.cz]
Sent: Friday, September 23, 2011 12:04
To: user@cassandra.apache.org
Subject: Re: shutdown by drain

Dne 10.9.2011 21:48, Chris Goffinet napsal(a):
> For things like rolling restarts, we do:
>
> disablethrift
> disablegossip
> (...wait for all nodes to see this node go down..) drain
I discovered problem with this advice.

If i do nodetool drain before killing node nodetool returns just after flush 
and stuff disabling is finished on cassandra node. But flush can trigger 
possible compaction and if you kill node after drain it will interrupt 
compaction in progress resulting in wasted disk space. I am not sure if tmp 
files are cleaned on cassandra start.




Re: Storing (python) objects

2011-09-23 Thread David Allsopp
We have done exactly as you describe (nested dicts etc) - works fine as long
as you are happy to read the whole lump of data, i.e. don't need to read at
a finer granularity. This approach can also save a lot of storage space as
you don't have the overhead of many small columns.

Some folks also write JSON, which would be a bit more language-independent
of course.

On 22 September 2011 19:28, Ian Danforth  wrote:

> All,
>
>  I find myself considering storing serialized python dicts in Cassandra.
> I'd like to store fairly complex, nested dicts, and it's just easier to do
> this rather than work out a lot of super columns / columns etc.
>
>  Do others find themselves storing serialized data structures in Cassandra
> or is this generally a sign of doing something wrong?
>
>  Thanks in advance!
>
> Ian
>


RE: How to enable JNA for Cassandra on Windows?

2011-09-23 Thread Viktor Jevdokimov
I found that there's no C library under Windows, and msvcrt does not provide 
mlockall function, so currently there's no way to use JNA under Windows. That 
way mmap is not a good idea?





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[cid:signature-logo3d60.png]

[cid:dm-exco6cf8.png]

Follow:


[cid:tweet5595.png]

Visit our blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:vjevdoki...@gmail.com]
Sent: Thursday, September 22, 2011 15:01
To: user@cassandra.apache.org
Subject: How to enable JNA for Cassandra on Windows?

Hi,

I'm trying without success to enable JNA for Cassandra on Windows.

Tried to place JNA 3.3.0 libs jna.jar and platform.jar into Cassandra 0.8.6 lib 
dir, but getting in log:
Unable to link C library. Native methods will be disabled.

What is missed or what is wrong?

One thing I've found on inet about JNA and Windows is this sample:



// Library is c for unix and msvcrt for windows
String libName = "c";
if (System.getProperty("os.name").contains("Windows"))
{
  libName = "msvcrt";


}


// Loading dynamically the library
CInterface demo = (CInterface) Native.loadLibrary(libName, CInterface.class);

from http://www.scriptol.com/programming/jna.php

while in Cassandra:



try

{

Native.register("c");

}

catch (NoClassDefFoundError e)

{

logger.info("JNA not found. Native methods will be 
disabled.");

}

catch (UnsatisfiedLinkError e)

{

logger.info("Unable to link C library. Native 
methods will be disabled.");

}

catch (NoSuchMethodError e)

{

logger.warn("Obsolete version of JNA present; unable to register C 
library. Upgrade to JNA 3.2.7 or later");

}

Is it true that for Windows Cassandra should do something like:



if (System.getProperty("os.name").contains("Windows"))
{
Native.register("msvcrt");
}

else

{

Native.register("c");
}


Thanks
Viktor
<><><>

Re: shutdown by drain

2011-09-23 Thread Radim Kolar

Dne 10.9.2011 21:48, Chris Goffinet napsal(a):

For things like rolling restarts, we do:

disablethrift
disablegossip
(...wait for all nodes to see this node go down..)
drain

I discovered problem with this advice.

If i do nodetool drain before killing node nodetool returns just after 
flush and stuff disabling is finished on cassandra node. But flush can 
trigger possible compaction and if you kill node after drain it will 
interrupt compaction in progress resulting in wasted disk space. I am 
not sure if tmp files are cleaned on cassandra start.


RE: Build Cassandra under Windows

2011-09-23 Thread Viktor Jevdokimov
Spolved - just used appropriate ant's targets to get jars built.




Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[cid:signature-logo793c.png]

[cid:dm-exco5a15.png]

Follow:


[cid:tweet354f.png]

Visit our blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:vjevdoki...@gmail.com]
Sent: Friday, September 23, 2011 10:02
To: user@cassandra.apache.org
Subject: Build Cassandra under Windows

Hello,

I'm trying to build Cassandra 0.8 and 1.0.0 branches with no success on 
Windows, getting errors:

...
maven-ant-tasks-retrieve-build:
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from 
repository central at http://repo1.maven.org/maven2
[artifact:dependencies] Unable to locate resource in repository
[artifact:dependencies] [INFO] Unable to find resource 
'asm:asm:java-source:sources:3.2' in repository central 
(http://repo1.maven.org/maven2)
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from 
repository apache at https://repository.apache.org/content/repositories/releases
[artifact:dependencies] Unable to locate resource in repository
...
and so on.

I have checked build/build-dependencies.xml and all files referenced are 
downloaded to local maven repository (${user.home}/.m2/repository) successfully.

Environment:
Windows 7 Professional x64
Ant 1.8.2
JDK 1.6.0 b27

I'm a .NET developer with no experience building JAVA projects with ant.

What have I missed?


Thanks,
Viktor
<><><>

Re: Performance degradation observed through embedded cassandra server - pointers needed

2011-09-23 Thread Roshan Dawrani
Thanks for sharing your inputs, Edward. Some comments inline below:

On Thu, Sep 22, 2011 at 7:31 PM, Edward Capriolo wrote:
>
>
>> 1) Should should try to dig in an determine why the truncate is slower.
> Look for related jira issues on truncation.
>

I should give it a try. I thought I might get some readymade pointers from
people already knowing about 0.7.2 / 0.8.5 differences on whether our
approach to truncate every test has gone even worse due to some changes in
that area.


> Cassandra had some re-entrant code you could fork a JVM each test and use
> the CassandraServiceDataCleaner. (However multiple startups could end up
> causing more overhead then the truncation)
>
> I avoid this problem by using a different column family and or a different
> keyspaces for all my unit tests in a single class. Each class bring up a new
> embedded cluster and uses the data cleaner to sanitize the data directories.
> So essentially I never call truncate.
>

In both these approaches, won't I need to re-build the schema for every test
too? Certainly in the 2nd case, if I end up creating new keyspace or
different column families for each test. I am not sure what I will gain
there in terms of performance. I was hoping data truncation leaving schema
there would be faster than that.

-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani 
Skype: roshandawrani


Re: MessagingService.sendOneWay sending blank bytes?

2011-09-23 Thread Jonathan Ellis
Yes.  This is one of the things fixed for 1.0 in
https://issues.apache.org/jira/browse/CASSANDRA-1788

On Thu, Sep 22, 2011 at 11:16 PM, Greg Hinkle  wrote:
> I noticed that on the 0.8 branch the implementation of 
> MessagingService.sendOneWay is building up a DataOutputBuffer with a default 
> size of 128 bytes, but then sending it as the full buffer no matter how many 
> bytes the the data takes. I believe it should be calling 
> DataOutputBuffer.asByteArray() or copying just up to the length() into the 
> ByteBuffer. This means it appears to be wasting on around 40 to 80 bytes on 
> every message. This really adds up in a big cluster.
>
> It looks like things are different in trunk, but can anyone confirm this bug 
> in 0.8? Thanks.
>
>
> Greg Hinkle
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: is it possible for light-traffic CF to hold down many commit logs?

2011-09-23 Thread Yang
Thanks Sylvain, this is exactly what I need.




On Fri, Sep 23, 2011 at 12:10 AM, Sylvain Lebresne  wrote:
> In 1.0.0, you have:
>
> # Total space to use for commitlogs.
> # If space gets above this value (it will round up to the next nearest
> # segment multiple), Cassandra will flush every dirty CF in the oldest
> # segment and remove it.
> # commitlog_total_space_in_mb: 4096
>
> In 0.8, you're supposed to use the memtableFlushAfterMins property
> for each CF to avoid filling up your commit log partition. Which is a
> little more involved, but that is why we have improved that in 1.0.
>
> --
> Sylvain
>
>
> On Fri, Sep 23, 2011 at 7:47 AM, Yang  wrote:
>> thanks for the input.
>>
>> if that's the case, I think the solution would be to sort the CFs to
>> flush by a more complex criteria than just size. for example the
>> number of dirty commit logs that contain this CF should be considered
>> as a score.
>>
>> Yang
>>
>> On Thu, Sep 22, 2011 at 10:40 PM, Philippe  wrote:
>>> It sure looks like what I'm seeing on my cluster where a 100G commit lot
>>> partition fills up in 12 hours (0.8.x)
>>>
>>> Le 23 sept. 2011 03:45, "Yang"  a écrit :
 in 1.0.0 we don't have memtable_throughput for each individual CF ,
 and instead
 which memtable/CF to flush is determined by "largest
 getTotalMemtableLiveSize() ".
 (MeteredFlusher.java line 81)


 what would happen in the following case ? : I have only 2 CF, the
 traffic for one CF is 1000 times that
 of the second CF,
 so the high-traffic CF constantly triggers total mem threshold , and
 every time, the busy CF is flushed.

 but the light-traffic CF is never flushed ( well, until we have
 flushed about 1000 times the busy CF),
 now we are left with many commit logs , each of them containing a few
 entries for the light-traffic table. we have to keep these commit logs
 because these entries are not flushed to sstable yet.

 then there are 2 problems: 1) to persist the few records from the
 light-traffic CF, you have to keep 1000 times the commit logs
 necessary, taking up disk space 2) when you do a recover on server
 restart, you'll have to read through all those commit logs .

 does the above hypothesis sound right?

 Thanks
 Yang
>>>
>>
>


Re: is it possible for light-traffic CF to hold down many commit logs?

2011-09-23 Thread Sylvain Lebresne
In 1.0.0, you have:

# Total space to use for commitlogs.
# If space gets above this value (it will round up to the next nearest
# segment multiple), Cassandra will flush every dirty CF in the oldest
# segment and remove it.
# commitlog_total_space_in_mb: 4096

In 0.8, you're supposed to use the memtableFlushAfterMins property
for each CF to avoid filling up your commit log partition. Which is a
little more involved, but that is why we have improved that in 1.0.

--
Sylvain


On Fri, Sep 23, 2011 at 7:47 AM, Yang  wrote:
> thanks for the input.
>
> if that's the case, I think the solution would be to sort the CFs to
> flush by a more complex criteria than just size. for example the
> number of dirty commit logs that contain this CF should be considered
> as a score.
>
> Yang
>
> On Thu, Sep 22, 2011 at 10:40 PM, Philippe  wrote:
>> It sure looks like what I'm seeing on my cluster where a 100G commit lot
>> partition fills up in 12 hours (0.8.x)
>>
>> Le 23 sept. 2011 03:45, "Yang"  a écrit :
>>> in 1.0.0 we don't have memtable_throughput for each individual CF ,
>>> and instead
>>> which memtable/CF to flush is determined by "largest
>>> getTotalMemtableLiveSize() ".
>>> (MeteredFlusher.java line 81)
>>>
>>>
>>> what would happen in the following case ? : I have only 2 CF, the
>>> traffic for one CF is 1000 times that
>>> of the second CF,
>>> so the high-traffic CF constantly triggers total mem threshold , and
>>> every time, the busy CF is flushed.
>>>
>>> but the light-traffic CF is never flushed ( well, until we have
>>> flushed about 1000 times the busy CF),
>>> now we are left with many commit logs , each of them containing a few
>>> entries for the light-traffic table. we have to keep these commit logs
>>> because these entries are not flushed to sstable yet.
>>>
>>> then there are 2 problems: 1) to persist the few records from the
>>> light-traffic CF, you have to keep 1000 times the commit logs
>>> necessary, taking up disk space 2) when you do a recover on server
>>> restart, you'll have to read through all those commit logs .
>>>
>>> does the above hypothesis sound right?
>>>
>>> Thanks
>>> Yang
>>
>


Build Cassandra under Windows

2011-09-23 Thread Viktor Jevdokimov
Hello,

I'm trying to build Cassandra 0.8 and 1.0.0 branches with no success on
Windows, getting errors:

...
maven-ant-tasks-retrieve-build:
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from
repository central at http://repo1.maven.org/maven2
[artifact:dependencies] Unable to locate resource in repository
[artifact:dependencies] [INFO] Unable to find resource
'asm:asm:java-source:sources:3.2' in repository central (
http://repo1.maven.org/maven2)
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from
repository apache at
https://repository.apache.org/content/repositories/releases
[artifact:dependencies] Unable to locate resource in repository
...
and so on.

I have checked build/build-dependencies.xml and all files referenced are
downloaded to local maven repository (${user.home}/.m2/repository)
successfully.

Environment:
Windows 7 Professional x64
Ant 1.8.2
JDK 1.6.0 b27

I'm a .NET developer with no experience building JAVA projects with ant.

What have I missed?


Thanks,
Viktor