Re: Counter Column

2014-12-27 Thread Ajay
Thanks.

I went through some articles which mentioned that the client to pass the
timestamp for insert and update. Is that anyway we can avoid it and
Cassandra assume the current time of the server?

Thanks
Ajay
On Dec 26, 2014 10:50 PM, Eric Stevens migh...@gmail.com wrote:

 Timestamps are timezone independent.  This is a property of timestamps,
 not a property of Cassandra. A given moment is the same timestamp
 everywhere in the world.  To display this in a human readable form, you
 then need to know what timezone you're attempting to represent the
 timestamp as, this is the information necessary to convert it to local time.

 On Fri, Dec 26, 2014 at 2:05 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 If the nodes of Cassandra ring are in different timezone, could it affect
 the counter column as it depends on the timestamp?

 Thanks
 Ajay




Re: any code to load large data from web into Cassandra

2014-12-27 Thread Jack Krupansky
Sorry, but you are still not being clear. In particular, website data has
no common, defined meaning. You'll need to use some standard, defined
terminology or specific examples so that we can have some idea what you are
referring to.

The blog post you cited is referring to the Twitter API, presumably to read
tweets. Okay, fine, but you'll have to be more specific about what you want
to do with them. Yes, Cassandra is primarily focus on structured data, but
you can of course store unstructured and semi-structured data as blobs,
JSON strings, map columns, etc.

Please describe in a little more detail what problem you are trying to
solve.

I mean, website data might mean any data (in any format) stored at a web
URL, which might be a web page, a data file linked by a web page, or...
it could be a REST API like Twitter). Or it could be... whatever. Cassandra
is basically a storage engine - it can store anything. There are a wide
variety of tools that can be used to ingest data from the infinite
variety of sources for data. But you'll need to state more specifically
what you are actually tring to accomplish.

Also, large data could be... anything, like Big Data. So more
specificity is needed.

Alternatively, you could hire a consultant to help guide you through the
application analysis process to determine your application
requirements, and then you could simply post your application
requirements, or at least a concise summary or relevant excerpt.

-- Jack Krupansky


-- Jack Krupansky

On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact joannenetw...@gmail.com
wrote:

 Thank you. I did not express clearly on my question.

 I wonder if there is sample code to load any website data to Cassandra?

 Say, this webpage http://datatomix.com/?p=84 seems to use Python, tweepy,
 to use twitter API to get data in json format and then load data into
 Cassandra.

 So it seems tweepy is special for twitter API. Is there a code for any
 website?
 Btw I am not familiar with Python yet. So the answer may not be limited to
 Python.

 Thanks!

 On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling 
 keith.sterl...@first-utility.com wrote:

 Take a look at sstableloader. We use it to load 30+m rows into Cassandra

 Datastax documentation is a good staty

 --
 *Keith Sterling*
 *Head of Software*

  *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com
  *P:* +44 7771 597 630
  *W:* first-utility.com http://www.first-utility.com/
  *A:* Opus 40 Business Park,
 Haywood Road, Warwick CV34 5AH



 On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact joannenetw...@gmail.com
 wrote:

  Hello I am new. Did not seem to find the answer after a brief
 research. Please help.

 Thanks!

 J






Re: Counter Column

2014-12-27 Thread Phil Yang
In java,
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
return the difference, measured in milliseconds, between the current time
and midnight, January 1, 1970 UTC. It means the timestamp which Cassandra
uses is not independent on the timezone.

2014-12-27 21:08 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Thanks.

 I went through some articles which mentioned that the client to pass the
 timestamp for insert and update. Is that anyway we can avoid it and
 Cassandra assume the current time of the server?

 Thanks
 Ajay
 On Dec 26, 2014 10:50 PM, Eric Stevens migh...@gmail.com wrote:

 Timestamps are timezone independent.  This is a property of timestamps,
 not a property of Cassandra. A given moment is the same timestamp
 everywhere in the world.  To display this in a human readable form, you
 then need to know what timezone you're attempting to represent the
 timestamp as, this is the information necessary to convert it to local time.

 On Fri, Dec 26, 2014 at 2:05 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 If the nodes of Cassandra ring are in different timezone, could it
 affect the counter column as it depends on the timestamp?

 Thanks
 Ajay




-- 
Thanks,
Phil Yang


Re: Counter Column

2014-12-27 Thread Phil Yang
sorry for typo.. timestamp which Cassandra uses is independent on the
timezone.

Usually, it is recommended to use NTP to reduce the difference of
timestamps in each nodes

2014-12-27 21:20 GMT+08:00 Phil Yang ud1...@gmail.com:

 In java,
 http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
 return the difference, measured in milliseconds, between the current time
 and midnight, January 1, 1970 UTC. It means the timestamp which Cassandra
 uses is not independent on the timezone.

 2014-12-27 21:08 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Thanks.

 I went through some articles which mentioned that the client to pass the
 timestamp for insert and update. Is that anyway we can avoid it and
 Cassandra assume the current time of the server?

 Thanks
 Ajay
 On Dec 26, 2014 10:50 PM, Eric Stevens migh...@gmail.com wrote:

 Timestamps are timezone independent.  This is a property of timestamps,
 not a property of Cassandra. A given moment is the same timestamp
 everywhere in the world.  To display this in a human readable form, you
 then need to know what timezone you're attempting to represent the
 timestamp as, this is the information necessary to convert it to local time.

 On Fri, Dec 26, 2014 at 2:05 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 If the nodes of Cassandra ring are in different timezone, could it
 affect the counter column as it depends on the timestamp?

 Thanks
 Ajay




 --
 Thanks,
 Phil Yang




-- 
Thanks,
Phil Yang


Re: any code to load large data from web into Cassandra

2014-12-27 Thread Keith Sterling
Check out this datastax article


http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html




And code examples can be found here




https://github.com/PatrickCallaghan/datastax-bulkloader-writer-example




You can write a writer in scala or Java which will convert csv et into ss 
tables and then use sstableloader to load direct into Cassandra




K


-- 

Keith Sterling

Head of Software





E: keith.sterl...@first-utility.com



P: +44 7771 597 630



W: first-utility.com



A: Opus 40 Business Park, 


Haywood Road, Warwick CV34 5AH

On Sat, Dec 27, 2014 at 1:11 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Sorry, but you are still not being clear. In particular, website data has
 no common, defined meaning. You'll need to use some standard, defined
 terminology or specific examples so that we can have some idea what you are
 referring to.
 The blog post you cited is referring to the Twitter API, presumably to read
 tweets. Okay, fine, but you'll have to be more specific about what you want
 to do with them. Yes, Cassandra is primarily focus on structured data, but
 you can of course store unstructured and semi-structured data as blobs,
 JSON strings, map columns, etc.
 Please describe in a little more detail what problem you are trying to
 solve.
 I mean, website data might mean any data (in any format) stored at a web
 URL, which might be a web page, a data file linked by a web page, or...
 it could be a REST API like Twitter). Or it could be... whatever. Cassandra
 is basically a storage engine - it can store anything. There are a wide
 variety of tools that can be used to ingest data from the infinite
 variety of sources for data. But you'll need to state more specifically
 what you are actually tring to accomplish.
 Also, large data could be... anything, like Big Data. So more
 specificity is needed.
 Alternatively, you could hire a consultant to help guide you through the
 application analysis process to determine your application
 requirements, and then you could simply post your application
 requirements, or at least a concise summary or relevant excerpt.
 -- Jack Krupansky
 -- Jack Krupansky
 On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact joannenetw...@gmail.com
 wrote:
 Thank you. I did not express clearly on my question.

 I wonder if there is sample code to load any website data to Cassandra?

 Say, this webpage http://datatomix.com/?p=84 seems to use Python, tweepy,
 to use twitter API to get data in json format and then load data into
 Cassandra.

 So it seems tweepy is special for twitter API. Is there a code for any
 website?
 Btw I am not familiar with Python yet. So the answer may not be limited to
 Python.

 Thanks!

 On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling 
 keith.sterl...@first-utility.com wrote:

 Take a look at sstableloader. We use it to load 30+m rows into Cassandra

 Datastax documentation is a good staty

 --
 *Keith Sterling*
 *Head of Software*

  *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com
  *P:* +44 7771 597 630
  *W:* first-utility.com http://www.first-utility.com/
  *A:* Opus 40 Business Park,
 Haywood Road, Warwick CV34 5AH



 On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact joannenetw...@gmail.com
 wrote:

  Hello I am new. Did not seem to find the answer after a brief
 research. Please help.

 Thanks!

 J





Re: Counter Column

2014-12-27 Thread Eric Stevens
Having the client pass the timestamp is optional, if you do not provide one
from the client, then it will use the server's timestamp.

On Sat, Dec 27, 2014, 6:25 AM Phil Yang ud1...@gmail.com wrote:

 sorry for typo.. timestamp which Cassandra uses is independent on the
 timezone.

 Usually, it is recommended to use NTP to reduce the difference of
 timestamps in each nodes

 2014-12-27 21:20 GMT+08:00 Phil Yang ud1...@gmail.com:

 In java,
 http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
 return the difference, measured in milliseconds, between the current time
 and midnight, January 1, 1970 UTC. It means the timestamp which Cassandra
 uses is not independent on the timezone.

 2014-12-27 21:08 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Thanks.

 I went through some articles which mentioned that the client to pass the
 timestamp for insert and update. Is that anyway we can avoid it and
 Cassandra assume the current time of the server?

 Thanks
 Ajay
 On Dec 26, 2014 10:50 PM, Eric Stevens migh...@gmail.com wrote:

 Timestamps are timezone independent.  This is a property of timestamps,
 not a property of Cassandra. A given moment is the same timestamp
 everywhere in the world.  To display this in a human readable form, you
 then need to know what timezone you're attempting to represent the
 timestamp as, this is the information necessary to convert it to local 
 time.

 On Fri, Dec 26, 2014 at 2:05 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 If the nodes of Cassandra ring are in different timezone, could it
 affect the counter column as it depends on the timestamp?

 Thanks
 Ajay




 --
 Thanks,
 Phil Yang




 --
 Thanks,
 Phil Yang




Re: any code to load large data from web into Cassandra

2014-12-27 Thread Eric Stevens
I think Joanne is taking not about bulk loading, but about just general
access as in any standard client driver.

Joanne, this is a pretty broad topic. You would need to have some part of a
website built in some language such as Python or Java or some other
language. Then you would use an appropriate client driver for the
programming language you used for the rest of your website.

If you are just getting started with programming websites, I would start
first with making one which doesn't use a database at all, and once you can
submit a form and see the data which you submitted, then try to find a
client driver for your language and insert that data into your database.

A contact form is usually a good place to start as it is fairly simple.

On Sat, Dec 27, 2014, 8:11 AM Keith Sterling 
keith.sterl...@first-utility.com wrote:

 Check out this datastax article


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html

 And code examples can be found here

 https://github.com/PatrickCallaghan/datastax-bulkloader-writer-example

 You can write a writer in scala or Java which will convert csv et into ss
 tables and then use sstableloader to load direct into Cassandra

 K

 --
 *Keith Sterling*
 *Head of Software*

  *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com
  *P:* +44 7771 597 630
  *W:* first-utility.com http://www.first-utility.com/
  *A:* Opus 40 Business Park,
 Haywood Road, Warwick CV34 5AH



 On Sat, Dec 27, 2014 at 1:11 PM, Jack Krupansky jack.krupan...@gmail.com
 wrote:

 Sorry, but you are still not being clear. In particular, website data
 has no common, defined meaning. You'll need to use some standard, defined
 terminology or specific examples so that we can have some idea what you are
 referring to.

 The blog post you cited is referring to the Twitter API, presumably to
 read tweets. Okay, fine, but you'll have to be more specific about what you
 want to do with them. Yes, Cassandra is primarily focus on structured data,
 but you can of course store unstructured and semi-structured data as blobs,
 JSON strings, map columns, etc.

 Please describe in a little more detail what problem you are trying to
 solve.

 I mean, website data might mean any data (in any format) stored at a
 web URL, which might be a web page, a data file linked by a web page,
 or... it could be a REST API like Twitter). Or it could be... whatever.
 Cassandra is basically a storage engine - it can store anything. There are
 a wide variety of tools that can be used to ingest data from the infinite
 variety of sources for data. But you'll need to state more specifically
 what you are actually tring to accomplish.

 Also, large data could be... anything, like Big Data. So more
 specificity is needed.

 Alternatively, you could hire a consultant to help guide you through the
 application analysis process to determine your application
 requirements, and then you could simply post your application
 requirements, or at least a concise summary or relevant excerpt.

 -- Jack Krupansky


 -- Jack Krupansky

 On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact joannenetw...@gmail.com
 wrote:

  Thank you. I did not express clearly on my question.

 I wonder if there is sample code to load any website data to Cassandra?

 Say, this webpage http://datatomix.com/?p=84 seems to use Python,
 tweepy, to use twitter API to get data in json format and then load data
 into Cassandra.

  So it seems tweepy is special for twitter API. Is there a code for any
 website?
 Btw I am not familiar with Python yet. So the answer may not be limited
 to Python.

 Thanks!

  On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling 
 keith.sterl...@first-utility.com wrote:

 Take a look at sstableloader. We use it to load 30+m rows into Cassandra

 Datastax documentation is a good staty

 --
 *Keith Sterling*
 *Head of Software*

  *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com
  *P:* +44 7771 597 630
  *W:* first-utility.com http://www.first-utility.com/
  *A:* Opus 40 Business Park,
 Haywood Road, Warwick CV34 5AH



 On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact 
 joannenetw...@gmail.com wrote:

  Hello I am new. Did not seem to find the answer after a brief
 research. Please help.

 Thanks!

 J








RE: Why read row is so slower than read column.

2014-12-27 Thread Andreas Finke
Hi,

I would recommend to turn tracing on in CQL. Using this you can find out that 
part of the query results in high latency.

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Regards
Andi


From: yhq...@sina.com [yhq...@sina.com]
Sent: 26 December 2014 14:01
To: user
Subject: Why read row is so slower than read column.


Hi, all:

   In my cf, each row has two column, one column is the timestamp(64bit), 
another column is data which may be 500k about.


   I read row, the qps is about 30.

   I read that data column, the qps is about 500.


   Why read performance is so slow where add a so small column in read??


Thanks.



Best practice for sorting on frequent updated column?

2014-12-27 Thread ziju feng
I need to sort data on a frequent updated column, such as like count of an
item. The common way of getting data sorted in Cassandra is to have the
column to be sorted on as clustering key. However, whenever such column is
updated, we need to delete the row of old value and insert the new one,
which not only can generate a lot of tombstones, but also require a
read-before-write if we don't know the original value (such as using
counter table to maintain the count and propagate it to the table that
needs to sort on the count).

I was wondering what is best practice for such use case? I'm currently
using DSE search to handle it but I would like to see a Cassandra only
solution.

Thanks.


Re: Re: Why read row is so slower than read column.

2014-12-27 Thread Eric Stevens
Can you send us your exact data model?  Even though you normally use
Thrift, you may also be able to access the data from CQL, and if so, query
tracing is a very powerful feature in CQL which may describe why there is a
performance difference.

Do you do deletes of data?  If so, tombstones really may be the cause of
the performance difference.

On Fri, Dec 26, 2014 at 6:58 PM, yhq...@sina.com wrote:

 I use thrift interface to query the data.


 - -

 What do your CQL queries look like?

 -- Jack Krupansky

 On Fri, Dec 26, 2014 at 8:00 AM, yhq...@sina.com wrote:

 Hi, all:

In my cf, each row has two column, one column is the timestamp(64bit),
 another column is data which may be 500k about.


I read row, the qps is about 30.

I read that data column, the qps is about 500.


Why read performance is so slow where add a so small column in read??


 Thanks.