Re: Counters 2.1 Accuracy

2015-06-24 Thread Phil Yang
IMO, the main concern of C*'s counter is, it is not idempotent. For
example, if you add a counter and get a timeout error, you can not know
whether it is successful. For non-counter writes, they are idempotent so
you can just retry, but if you retry in counter, there may be a double
write.

2015-06-23 12:23 GMT+08:00 Mike Trienis :

>
> Hi All,
>
> I'm fairly new to Cassandra and am planning on using it as a datastore for
> an Apache Spark cluster.
>
> The use case is fairly simple, read the raw data and perform aggregates
> and push the rolled up data back to Cassandra. The data models will use
> counters pretty heavily so I'd like to understand what kind of accuracy
> should I expect from Cassandra 2.1 when increment the counters.
>
>-
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>
> The blog post above states that the new counter implementations are
> "safer" although I'm not sure what that means in practice. Will the
> counters be 99.99% accurate? How often will they be over or under counted?
>
> Thanks, Mike.
>



-- 
Thanks,
Phil Yang


Re: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-29 Thread Phil Yang
see https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L77

SSTable data directory name will have hex string appended after CF name

2015-04-29 13:04 GMT+08:00 Donald Smith :

>  Using 2.1.4, tables in our data/ directory are showing up as
>
>
>  our_table-147a2090ed4211e480153bc81e542ebd/
>
>
>  instead of as
>
>
>   our_table/
>
>
>  Why would that happen? We're also seeing lagging compactions and high
> cpu usage.
>
>
>   Thanks, Don
>



-- 
Thanks,
Phil Yang


Re: Creating 'Put' requests

2015-04-24 Thread Phil Yang
2015-04-23 22:16 GMT+08:00 Matthew Johnson :
>
> In HBase, we do something like:
>
> Put put = new Put(id);
> put.add(myPojo.getTimestamp(), myPojo.getValue());
> put.add(myPojo.getMySecondTimestamp(), myPojo.getSecondValue());
> server.put(put);
>
> Is there any similar mechanism in Cassandra Java driver for creating these
> inserts programmatically? Or, can the 'session.execute' take a list of
> commands so that each column can be inserted as its own insert statement
> but
> without the overhead of multiple calls to the server?
>
>

For your first question, do you mean object-mapping API?
http://docs.datastax.com/en/developer/java-driver/2.1/java-driver/reference/crudOperations.html

For the second question, C* can execute several commands by unlogged batch,
however, because of the distributed nature of Cassandra, there is a better
solution, see
https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e




> Thanks!
> Matt
>
>
> -Original Message-
> From: Jim Witschey [mailto:jim.witsc...@datastax.com]
> Sent: 23 April 2015 14:46
> To: user@cassandra.apache.org
> Subject: Re: Creating 'Put' requests
>
> Are prepared statements what you're looking for?
>
>
> http://docs.datastax.com/en/developer/java-driver/2.1/java-driver/quick_start/qsSimpleClientBoundStatements_t.html
> Jim Witschey
>
> Software Engineer in Test | jim.witsc...@datastax.com
>
>
>
>
>
> On Thu, Apr 23, 2015 at 9:28 AM, Matthew Johnson 
> wrote:
> > Hi all,
> >
> >
> >
> > Currently looking at switching from HBase to Cassandra, and one big
> > difference so far is that in HBase, we create a ‘Put’ object, add to
> > it a set of column/value pairs, and send the Put to the server. So far
> > in Cassandra 2.1.4 the tutorials seem to suggest using CQL3, which I
> > really like for prototyping eg:
> >
> >
> >
> > session.execute("INSERT INTO simplex.playlists (id, song_id, title,
> > album,
> > artist) VALUES (1,1,'La Petite Tonkinoise','Bye Bye
> > Blackbird','Joséphine Baker');");
> >
> >
> >
> > But for more complicated code this will quickly become unmanageable,
> > and doesn’t lend itself well to dynamically creating row data based on
> > various conditions. Is there a way to send a Java object, populated
> > with the desired column/value pairs, to the server instead of executing
> an
> > insert statement?
> > Would this require some other library, or does the DataStax Java
> > driver support this already?
> >
> >
> >
> > Thanks in advance,
> >
> > Matt
> >
> >
>



-- 
Thanks,
Phil Yang


Re: Is 2.1.5 ready for upgrade?

2015-04-22 Thread Phil Yang
I think it is an acceptable idea to build the latest code in cassandra-2.1
branch rather than waiting for official release because the older versions
for 2.1.x indeed have some serious issues. At least I did this in our
cluster and our troubles in 2.1.1 had been fixed.

2015-04-22 15:22 GMT+08:00 Nathan Bijnens :

> We had some serious issues with 2.1.3:
> - Bootstrapping a new node resulted in OOM
> - Repair resulted in an OOM on several nodes
> - When reading some parts of the data it caused cascading crashes on all
> it's replica nodes.
>
> Downgrading to the 2.0.X branch didn't work because of some
> incompatibilities, so we launched a new cluster and migrated all data.
>
> We will not be looking at 2.1 until we see some major resolved issues.
>
> IMHO if you don't need counters stick to the 2.0.X branch. DTCS is
> available from 2.0.11.
>
> N.
>
> On Tue, Apr 21, 2015 at 11:50 PM Brian Sam-Bodden <
> bsbod...@integrallis.com> wrote:
>
>> Robert,
>> Can you elaborate more please?
>>
>> Cheers,
>> Brian
>>
>>
>> On Tuesday, April 21, 2015, Robert Coli  wrote:
>>
>>> On Tue, Apr 21, 2015 at 2:25 PM, Dikang Gu  wrote:
>>>
>>>> We have some issues with streaming in 2.1.2. We find that there are a
>>>> lot of patches in 2.1.5. Is it ready for upgrade?
>>>>
>>>
>>> I personally would not run either version in production at this time,
>>> but if forced, would prefer 2.1.5 over 2.1.2.
>>>
>>> =Rob
>>>
>>>
>>
>>
>> --
>> Cheers,
>> Brian
>> http://www.integrallis.com
>>
>>


-- 
Thanks,
Phil Yang


Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs

2015-04-22 Thread Phil Yang
Only if there is a gc over more than 200ms it will be logged. You can use
jstat to see whether each young gen gc takes so long like this, if so, you
may need to reduce the size of young gen in conf/cassandra-env.sh to reduce
the stopping time. Of course it will make the gc triggered more frequently
so there is a trade off.

2015-04-21 2:23 GMT+08:00 Anuj Wadehra :

> I meant 248 milli seconds
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> --
>   *From*:"Anuj Wadehra" 
> *Date*:Mon, 20 Apr, 2015 at 11:41 pm
> *Subject*:Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs
>
> I think this is just saying that young gen collection using Par new
> collector took 248 seconds. This is quite normal with CMS unless it happens
> too frequenltly several times in a sec. I think query time has more to do
> with read timeout in yaml. Try increasing it. If its a range query then
> please increase range timeout in yaml.
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> --
> *From*:"shahab" 
> *Date*:Mon, 20 Apr, 2015 at 9:59 pm
> *Subject*:Getting " ParNew GC in ... CMS Old Gen ... " in logs
>
> Hi,
>
> I am keep getting following line in the cassandra logs, apparently
> something related to Garbage Collection. And I guess this is one of the
> signs why i do not get any response (i get time-out) when I query large
> volume of data ?!!!
>
>  ParNew GC in 248ms.  CMS Old Gen: 453244264 -> 570471312; Par Eden Space:
> 167712624 -> 0; Par Survivor Space: 0 -> 20970080
>
> Is above line is indication of something that need to be fixed in the
> system?? how can I resolve this?
>
>
> best,
> /Shahab
>
>


-- 
Thanks,
Phil Yang


Re: Re-bootstrap node after disk failure

2015-03-25 Thread Phil Yang
Sorry I misunderstanded your need, you can replace the node with hard drive
failure using
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
. In your case the node being replaced has the same ip/host with the "new
node" with new hard drive.

2015-03-25 13:46 GMT+08:00 Flavien Charlon :

> Is it what this command does? In that case the documentation is misleading
> because it says: "Use this command to bring up a new data center in an
> existing cluster", which is not really what I'm trying to do.
>
> On 24 March 2015 at 21:12, Phil Yang  wrote:
>
>> you can use "nodetool rebuild" in this node.
>>
>> 2015-03-25 9:20 GMT+08:00 Flavien Charlon :
>>
>>> Hi,
>>>
>>> What is the process to re-bootstrap a node after hard drive failure
>>> (Cassandra 2.1.3)?
>>>
>>> This is the same node as previously, but the data folder has been wiped,
>>> and I would like to re-bootstrap it from the data stored on the other nodes
>>> of the cluster (I have RF=3).
>>>
>>> I am not using vnodes.
>>>
>>> Thanks
>>> Flavien
>>>
>>
>>
>>
>> --
>> Thanks,
>> Phil Yang
>>
>>
>


-- 
Thanks,
Phil Yang


Re: Re-bootstrap node after disk failure

2015-03-24 Thread Phil Yang
you can use "nodetool rebuild" in this node.

2015-03-25 9:20 GMT+08:00 Flavien Charlon :

> Hi,
>
> What is the process to re-bootstrap a node after hard drive failure
> (Cassandra 2.1.3)?
>
> This is the same node as previously, but the data folder has been wiped,
> and I would like to re-bootstrap it from the data stored on the other nodes
> of the cluster (I have RF=3).
>
> I am not using vnodes.
>
> Thanks
> Flavien
>



-- 
Thanks,
Phil Yang


Re: Steps to do after schema changes

2015-03-11 Thread Phil Yang
Usually, you have nothing to do. Changes will be synced to every nodes
automatically.

2015-03-12 13:21 GMT+08:00 Ajay :

> Hi,
>
> Are there any steps to do (like nodetool or restart node) or any
> precautions after schema changes are done in a column family say adding a
> new column or modifying any table properties?
>
> Thanks
> Ajay
>



-- 
Thanks,
Phil Yang


Re: Node stuck in joining the ring

2015-03-02 Thread Phil Yang
I encountered a similar situation that streaming can not finish, not only
in joining but in removing a node. My tricky solution is: restart every
node in the cluster before you starting the new node. In my experience
streaming stucked only shows in the node that have been running many days
although I have no idea about the reason.

2015-03-03 2:42 GMT+08:00 Nate McCall :

> Can you verify that casssandra-rackdc.properties and
> cassandra-topology.properties are the same on the cluster?
>
> On Thu, Feb 26, 2015 at 7:52 AM, Batranut Bogdan 
> wrote:
>
>> No errors in the system.log file
>> [root@cassa09 cassandra]# grep "ERROR" system.log
>> [root@cassa09 cassandra]#
>>
>> Nothing.
>>
>>
>>   On Thursday, February 26, 2015 1:55 PM, mck  wrote:
>>
>>
>> Any errors in your log file?
>>
>> We saw something similar when bootstrap crashed when rebuilding
>> secondary indexes.
>>
>> See CASSANDRA-8798
>>
>> ~mck
>>
>>
>>
>>
>
>
> --
> -----
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Thanks,
Phil Yang


What are the factors that affect the release time of each minor version?

2015-02-28 Thread Phil Yang
Hi all

As a user of Cassandra, sometimes there are some bugs in my cluster and I
hope someone can fix them (Of course, if I can fix them myself I'll try to
contribute my code :) ). For each bug, there is a JIRA ticket to tracking
it and users can know if the bug is fixed.

However, there is a lag between this bug being fixed and a new minor
version being released. Although we can apply the patch of this ticket to
our online version and build a special snapshot to solve the trouble in our
clusters or we can use the latest code directly, I think many users still
want to use an official release with higher reliability and indeed, more
convenience. In addition, updating more frequently can also reduce the
trouble causing by unknown bugs. So someone may often ask "When the new
version with this patch will be released?"

In my mind, not only the number of issues resolved in each version but also
the time interval between two versions is not fixed. So may I know what the
factors that affect the release time of each minor version?

Furthermore, except a vote in dev@cassandra maillist that I can see, what
are the duties to release a version? If it is not a heavy work, could we
make each release more frequently? Or we may make a rule to decide if we
need release a new version? For example: "If the latest version was
released two weeks ago, or after the latest version we have already
resolved 20 issues, we should release a new minor version".

-- 
Thanks,
Phil Yang


Re: Counter Column

2014-12-27 Thread Phil Yang
sorry for typo.. timestamp which Cassandra uses is independent on the
timezone.

Usually, it is recommended to use NTP to reduce the difference of
timestamps in each nodes

2014-12-27 21:20 GMT+08:00 Phil Yang :

> In java,
> http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
> return "the difference, measured in milliseconds, between the current time
> and midnight, January 1, 1970 UTC." It means the timestamp which Cassandra
> uses is not independent on the timezone.
>
> 2014-12-27 21:08 GMT+08:00 Ajay :
>
>> Thanks.
>>
>> I went through some articles which mentioned that the client to pass the
>> timestamp for insert and update. Is that anyway we can avoid it and
>> Cassandra assume the current time of the server?
>>
>> Thanks
>> Ajay
>> On Dec 26, 2014 10:50 PM, "Eric Stevens"  wrote:
>>
>>> Timestamps are timezone independent.  This is a property of timestamps,
>>> not a property of Cassandra. A given moment is the same timestamp
>>> everywhere in the world.  To display this in a human readable form, you
>>> then need to know what timezone you're attempting to represent the
>>> timestamp as, this is the information necessary to convert it to local time.
>>>
>>> On Fri, Dec 26, 2014 at 2:05 AM, Ajay  wrote:
>>>>
>>>> Hi,
>>>>
>>>> If the nodes of Cassandra ring are in different timezone, could it
>>>> affect the counter column as it depends on the timestamp?
>>>>
>>>> Thanks
>>>> Ajay
>>>>
>>>
>
>
> --
> Thanks,
> Phil Yang
>
>


-- 
Thanks,
Phil Yang


Re: Counter Column

2014-12-27 Thread Phil Yang
In java,
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
return "the difference, measured in milliseconds, between the current time
and midnight, January 1, 1970 UTC." It means the timestamp which Cassandra
uses is not independent on the timezone.

2014-12-27 21:08 GMT+08:00 Ajay :

> Thanks.
>
> I went through some articles which mentioned that the client to pass the
> timestamp for insert and update. Is that anyway we can avoid it and
> Cassandra assume the current time of the server?
>
> Thanks
> Ajay
> On Dec 26, 2014 10:50 PM, "Eric Stevens"  wrote:
>
>> Timestamps are timezone independent.  This is a property of timestamps,
>> not a property of Cassandra. A given moment is the same timestamp
>> everywhere in the world.  To display this in a human readable form, you
>> then need to know what timezone you're attempting to represent the
>> timestamp as, this is the information necessary to convert it to local time.
>>
>> On Fri, Dec 26, 2014 at 2:05 AM, Ajay  wrote:
>>>
>>> Hi,
>>>
>>> If the nodes of Cassandra ring are in different timezone, could it
>>> affect the counter column as it depends on the timestamp?
>>>
>>> Thanks
>>> Ajay
>>>
>>


-- 
Thanks,
Phil Yang