Re: Counters 2.1 Accuracy
IMO, the main concern of C*'s counter is, it is not idempotent. For example, if you add a counter and get a timeout error, you can not know whether it is successful. For non-counter writes, they are idempotent so you can just retry, but if you retry in counter, there may be a double write. 2015-06-23 12:23 GMT+08:00 Mike Trienis : > > Hi All, > > I'm fairly new to Cassandra and am planning on using it as a datastore for > an Apache Spark cluster. > > The use case is fairly simple, read the raw data and perform aggregates > and push the rolled up data back to Cassandra. The data models will use > counters pretty heavily so I'd like to understand what kind of accuracy > should I expect from Cassandra 2.1 when increment the counters. > >- > > http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters > > The blog post above states that the new counter implementations are > "safer" although I'm not sure what that means in practice. Will the > counters be 99.99% accurate? How often will they be over or under counted? > > Thanks, Mike. > -- Thanks, Phil Yang
Re: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir
see https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L77 SSTable data directory name will have hex string appended after CF name 2015-04-29 13:04 GMT+08:00 Donald Smith : > Using 2.1.4, tables in our data/ directory are showing up as > > > our_table-147a2090ed4211e480153bc81e542ebd/ > > > instead of as > > > our_table/ > > > Why would that happen? We're also seeing lagging compactions and high > cpu usage. > > > Thanks, Don > -- Thanks, Phil Yang
Re: Creating 'Put' requests
2015-04-23 22:16 GMT+08:00 Matthew Johnson : > > In HBase, we do something like: > > Put put = new Put(id); > put.add(myPojo.getTimestamp(), myPojo.getValue()); > put.add(myPojo.getMySecondTimestamp(), myPojo.getSecondValue()); > server.put(put); > > Is there any similar mechanism in Cassandra Java driver for creating these > inserts programmatically? Or, can the 'session.execute' take a list of > commands so that each column can be inserted as its own insert statement > but > without the overhead of multiple calls to the server? > > For your first question, do you mean object-mapping API? http://docs.datastax.com/en/developer/java-driver/2.1/java-driver/reference/crudOperations.html For the second question, C* can execute several commands by unlogged batch, however, because of the distributed nature of Cassandra, there is a better solution, see https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e > Thanks! > Matt > > > -Original Message- > From: Jim Witschey [mailto:jim.witsc...@datastax.com] > Sent: 23 April 2015 14:46 > To: user@cassandra.apache.org > Subject: Re: Creating 'Put' requests > > Are prepared statements what you're looking for? > > > http://docs.datastax.com/en/developer/java-driver/2.1/java-driver/quick_start/qsSimpleClientBoundStatements_t.html > Jim Witschey > > Software Engineer in Test | jim.witsc...@datastax.com > > > > > > On Thu, Apr 23, 2015 at 9:28 AM, Matthew Johnson > wrote: > > Hi all, > > > > > > > > Currently looking at switching from HBase to Cassandra, and one big > > difference so far is that in HBase, we create a ‘Put’ object, add to > > it a set of column/value pairs, and send the Put to the server. So far > > in Cassandra 2.1.4 the tutorials seem to suggest using CQL3, which I > > really like for prototyping eg: > > > > > > > > session.execute("INSERT INTO simplex.playlists (id, song_id, title, > > album, > > artist) VALUES (1,1,'La Petite Tonkinoise','Bye Bye > > Blackbird','Joséphine Baker');"); > > > > > > > > But for more complicated code this will quickly become unmanageable, > > and doesn’t lend itself well to dynamically creating row data based on > > various conditions. Is there a way to send a Java object, populated > > with the desired column/value pairs, to the server instead of executing > an > > insert statement? > > Would this require some other library, or does the DataStax Java > > driver support this already? > > > > > > > > Thanks in advance, > > > > Matt > > > > > -- Thanks, Phil Yang
Re: Is 2.1.5 ready for upgrade?
I think it is an acceptable idea to build the latest code in cassandra-2.1 branch rather than waiting for official release because the older versions for 2.1.x indeed have some serious issues. At least I did this in our cluster and our troubles in 2.1.1 had been fixed. 2015-04-22 15:22 GMT+08:00 Nathan Bijnens : > We had some serious issues with 2.1.3: > - Bootstrapping a new node resulted in OOM > - Repair resulted in an OOM on several nodes > - When reading some parts of the data it caused cascading crashes on all > it's replica nodes. > > Downgrading to the 2.0.X branch didn't work because of some > incompatibilities, so we launched a new cluster and migrated all data. > > We will not be looking at 2.1 until we see some major resolved issues. > > IMHO if you don't need counters stick to the 2.0.X branch. DTCS is > available from 2.0.11. > > N. > > On Tue, Apr 21, 2015 at 11:50 PM Brian Sam-Bodden < > bsbod...@integrallis.com> wrote: > >> Robert, >> Can you elaborate more please? >> >> Cheers, >> Brian >> >> >> On Tuesday, April 21, 2015, Robert Coli wrote: >> >>> On Tue, Apr 21, 2015 at 2:25 PM, Dikang Gu wrote: >>> >>>> We have some issues with streaming in 2.1.2. We find that there are a >>>> lot of patches in 2.1.5. Is it ready for upgrade? >>>> >>> >>> I personally would not run either version in production at this time, >>> but if forced, would prefer 2.1.5 over 2.1.2. >>> >>> =Rob >>> >>> >> >> >> -- >> Cheers, >> Brian >> http://www.integrallis.com >> >> -- Thanks, Phil Yang
Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs
Only if there is a gc over more than 200ms it will be logged. You can use jstat to see whether each young gen gc takes so long like this, if so, you may need to reduce the size of young gen in conf/cassandra-env.sh to reduce the stopping time. Of course it will make the gc triggered more frequently so there is a trade off. 2015-04-21 2:23 GMT+08:00 Anuj Wadehra : > I meant 248 milli seconds > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Anuj Wadehra" > *Date*:Mon, 20 Apr, 2015 at 11:41 pm > *Subject*:Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs > > I think this is just saying that young gen collection using Par new > collector took 248 seconds. This is quite normal with CMS unless it happens > too frequenltly several times in a sec. I think query time has more to do > with read timeout in yaml. Try increasing it. If its a range query then > please increase range timeout in yaml. > > Thanks > Anuj Wadehra > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"shahab" > *Date*:Mon, 20 Apr, 2015 at 9:59 pm > *Subject*:Getting " ParNew GC in ... CMS Old Gen ... " in logs > > Hi, > > I am keep getting following line in the cassandra logs, apparently > something related to Garbage Collection. And I guess this is one of the > signs why i do not get any response (i get time-out) when I query large > volume of data ?!!! > > ParNew GC in 248ms. CMS Old Gen: 453244264 -> 570471312; Par Eden Space: > 167712624 -> 0; Par Survivor Space: 0 -> 20970080 > > Is above line is indication of something that need to be fixed in the > system?? how can I resolve this? > > > best, > /Shahab > > -- Thanks, Phil Yang
Re: Re-bootstrap node after disk failure
Sorry I misunderstanded your need, you can replace the node with hard drive failure using http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html . In your case the node being replaced has the same ip/host with the "new node" with new hard drive. 2015-03-25 13:46 GMT+08:00 Flavien Charlon : > Is it what this command does? In that case the documentation is misleading > because it says: "Use this command to bring up a new data center in an > existing cluster", which is not really what I'm trying to do. > > On 24 March 2015 at 21:12, Phil Yang wrote: > >> you can use "nodetool rebuild" in this node. >> >> 2015-03-25 9:20 GMT+08:00 Flavien Charlon : >> >>> Hi, >>> >>> What is the process to re-bootstrap a node after hard drive failure >>> (Cassandra 2.1.3)? >>> >>> This is the same node as previously, but the data folder has been wiped, >>> and I would like to re-bootstrap it from the data stored on the other nodes >>> of the cluster (I have RF=3). >>> >>> I am not using vnodes. >>> >>> Thanks >>> Flavien >>> >> >> >> >> -- >> Thanks, >> Phil Yang >> >> > -- Thanks, Phil Yang
Re: Re-bootstrap node after disk failure
you can use "nodetool rebuild" in this node. 2015-03-25 9:20 GMT+08:00 Flavien Charlon : > Hi, > > What is the process to re-bootstrap a node after hard drive failure > (Cassandra 2.1.3)? > > This is the same node as previously, but the data folder has been wiped, > and I would like to re-bootstrap it from the data stored on the other nodes > of the cluster (I have RF=3). > > I am not using vnodes. > > Thanks > Flavien > -- Thanks, Phil Yang
Re: Steps to do after schema changes
Usually, you have nothing to do. Changes will be synced to every nodes automatically. 2015-03-12 13:21 GMT+08:00 Ajay : > Hi, > > Are there any steps to do (like nodetool or restart node) or any > precautions after schema changes are done in a column family say adding a > new column or modifying any table properties? > > Thanks > Ajay > -- Thanks, Phil Yang
Re: Node stuck in joining the ring
I encountered a similar situation that streaming can not finish, not only in joining but in removing a node. My tricky solution is: restart every node in the cluster before you starting the new node. In my experience streaming stucked only shows in the node that have been running many days although I have no idea about the reason. 2015-03-03 2:42 GMT+08:00 Nate McCall : > Can you verify that casssandra-rackdc.properties and > cassandra-topology.properties are the same on the cluster? > > On Thu, Feb 26, 2015 at 7:52 AM, Batranut Bogdan > wrote: > >> No errors in the system.log file >> [root@cassa09 cassandra]# grep "ERROR" system.log >> [root@cassa09 cassandra]# >> >> Nothing. >> >> >> On Thursday, February 26, 2015 1:55 PM, mck wrote: >> >> >> Any errors in your log file? >> >> We saw something similar when bootstrap crashed when rebuilding >> secondary indexes. >> >> See CASSANDRA-8798 >> >> ~mck >> >> >> >> > > > -- > ----- > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > -- Thanks, Phil Yang
What are the factors that affect the release time of each minor version?
Hi all As a user of Cassandra, sometimes there are some bugs in my cluster and I hope someone can fix them (Of course, if I can fix them myself I'll try to contribute my code :) ). For each bug, there is a JIRA ticket to tracking it and users can know if the bug is fixed. However, there is a lag between this bug being fixed and a new minor version being released. Although we can apply the patch of this ticket to our online version and build a special snapshot to solve the trouble in our clusters or we can use the latest code directly, I think many users still want to use an official release with higher reliability and indeed, more convenience. In addition, updating more frequently can also reduce the trouble causing by unknown bugs. So someone may often ask "When the new version with this patch will be released?" In my mind, not only the number of issues resolved in each version but also the time interval between two versions is not fixed. So may I know what the factors that affect the release time of each minor version? Furthermore, except a vote in dev@cassandra maillist that I can see, what are the duties to release a version? If it is not a heavy work, could we make each release more frequently? Or we may make a rule to decide if we need release a new version? For example: "If the latest version was released two weeks ago, or after the latest version we have already resolved 20 issues, we should release a new minor version". -- Thanks, Phil Yang
Re: Counter Column
sorry for typo.. timestamp which Cassandra uses is independent on the timezone. Usually, it is recommended to use NTP to reduce the difference of timestamps in each nodes 2014-12-27 21:20 GMT+08:00 Phil Yang : > In java, > http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis() > return "the difference, measured in milliseconds, between the current time > and midnight, January 1, 1970 UTC." It means the timestamp which Cassandra > uses is not independent on the timezone. > > 2014-12-27 21:08 GMT+08:00 Ajay : > >> Thanks. >> >> I went through some articles which mentioned that the client to pass the >> timestamp for insert and update. Is that anyway we can avoid it and >> Cassandra assume the current time of the server? >> >> Thanks >> Ajay >> On Dec 26, 2014 10:50 PM, "Eric Stevens" wrote: >> >>> Timestamps are timezone independent. This is a property of timestamps, >>> not a property of Cassandra. A given moment is the same timestamp >>> everywhere in the world. To display this in a human readable form, you >>> then need to know what timezone you're attempting to represent the >>> timestamp as, this is the information necessary to convert it to local time. >>> >>> On Fri, Dec 26, 2014 at 2:05 AM, Ajay wrote: >>>> >>>> Hi, >>>> >>>> If the nodes of Cassandra ring are in different timezone, could it >>>> affect the counter column as it depends on the timestamp? >>>> >>>> Thanks >>>> Ajay >>>> >>> > > > -- > Thanks, > Phil Yang > > -- Thanks, Phil Yang
Re: Counter Column
In java, http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis() return "the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC." It means the timestamp which Cassandra uses is not independent on the timezone. 2014-12-27 21:08 GMT+08:00 Ajay : > Thanks. > > I went through some articles which mentioned that the client to pass the > timestamp for insert and update. Is that anyway we can avoid it and > Cassandra assume the current time of the server? > > Thanks > Ajay > On Dec 26, 2014 10:50 PM, "Eric Stevens" wrote: > >> Timestamps are timezone independent. This is a property of timestamps, >> not a property of Cassandra. A given moment is the same timestamp >> everywhere in the world. To display this in a human readable form, you >> then need to know what timezone you're attempting to represent the >> timestamp as, this is the information necessary to convert it to local time. >> >> On Fri, Dec 26, 2014 at 2:05 AM, Ajay wrote: >>> >>> Hi, >>> >>> If the nodes of Cassandra ring are in different timezone, could it >>> affect the counter column as it depends on the timestamp? >>> >>> Thanks >>> Ajay >>> >> -- Thanks, Phil Yang