Re: any code to load large data from web into Cassandra
Sorry, but you are still not being clear. In particular, website data has no common, defined meaning. You'll need to use some standard, defined terminology or specific examples so that we can have some idea what you are referring to. The blog post you cited is referring to the Twitter API, presumably to read tweets. Okay, fine, but you'll have to be more specific about what you want to do with them. Yes, Cassandra is primarily focus on structured data, but you can of course store unstructured and semi-structured data as blobs, JSON strings, map columns, etc. Please describe in a little more detail what problem you are trying to solve. I mean, website data might mean any data (in any format) stored at a web URL, which might be a web page, a data file linked by a web page, or... it could be a REST API like Twitter). Or it could be... whatever. Cassandra is basically a storage engine - it can store anything. There are a wide variety of tools that can be used to ingest data from the infinite variety of sources for data. But you'll need to state more specifically what you are actually tring to accomplish. Also, large data could be... anything, like Big Data. So more specificity is needed. Alternatively, you could hire a consultant to help guide you through the application analysis process to determine your application requirements, and then you could simply post your application requirements, or at least a concise summary or relevant excerpt. -- Jack Krupansky -- Jack Krupansky On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact joannenetw...@gmail.com wrote: Thank you. I did not express clearly on my question. I wonder if there is sample code to load any website data to Cassandra? Say, this webpage http://datatomix.com/?p=84 seems to use Python, tweepy, to use twitter API to get data in json format and then load data into Cassandra. So it seems tweepy is special for twitter API. Is there a code for any website? Btw I am not familiar with Python yet. So the answer may not be limited to Python. Thanks! On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling keith.sterl...@first-utility.com wrote: Take a look at sstableloader. We use it to load 30+m rows into Cassandra Datastax documentation is a good staty -- *Keith Sterling* *Head of Software* *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com *P:* +44 7771 597 630 *W:* first-utility.com http://www.first-utility.com/ *A:* Opus 40 Business Park, Haywood Road, Warwick CV34 5AH On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact joannenetw...@gmail.com wrote: Hello I am new. Did not seem to find the answer after a brief research. Please help. Thanks! J
Re: any code to load large data from web into Cassandra
Check out this datastax article http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html And code examples can be found here https://github.com/PatrickCallaghan/datastax-bulkloader-writer-example You can write a writer in scala or Java which will convert csv et into ss tables and then use sstableloader to load direct into Cassandra K -- Keith Sterling Head of Software E: keith.sterl...@first-utility.com P: +44 7771 597 630 W: first-utility.com A: Opus 40 Business Park, Haywood Road, Warwick CV34 5AH On Sat, Dec 27, 2014 at 1:11 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Sorry, but you are still not being clear. In particular, website data has no common, defined meaning. You'll need to use some standard, defined terminology or specific examples so that we can have some idea what you are referring to. The blog post you cited is referring to the Twitter API, presumably to read tweets. Okay, fine, but you'll have to be more specific about what you want to do with them. Yes, Cassandra is primarily focus on structured data, but you can of course store unstructured and semi-structured data as blobs, JSON strings, map columns, etc. Please describe in a little more detail what problem you are trying to solve. I mean, website data might mean any data (in any format) stored at a web URL, which might be a web page, a data file linked by a web page, or... it could be a REST API like Twitter). Or it could be... whatever. Cassandra is basically a storage engine - it can store anything. There are a wide variety of tools that can be used to ingest data from the infinite variety of sources for data. But you'll need to state more specifically what you are actually tring to accomplish. Also, large data could be... anything, like Big Data. So more specificity is needed. Alternatively, you could hire a consultant to help guide you through the application analysis process to determine your application requirements, and then you could simply post your application requirements, or at least a concise summary or relevant excerpt. -- Jack Krupansky -- Jack Krupansky On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact joannenetw...@gmail.com wrote: Thank you. I did not express clearly on my question. I wonder if there is sample code to load any website data to Cassandra? Say, this webpage http://datatomix.com/?p=84 seems to use Python, tweepy, to use twitter API to get data in json format and then load data into Cassandra. So it seems tweepy is special for twitter API. Is there a code for any website? Btw I am not familiar with Python yet. So the answer may not be limited to Python. Thanks! On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling keith.sterl...@first-utility.com wrote: Take a look at sstableloader. We use it to load 30+m rows into Cassandra Datastax documentation is a good staty -- *Keith Sterling* *Head of Software* *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com *P:* +44 7771 597 630 *W:* first-utility.com http://www.first-utility.com/ *A:* Opus 40 Business Park, Haywood Road, Warwick CV34 5AH On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact joannenetw...@gmail.com wrote: Hello I am new. Did not seem to find the answer after a brief research. Please help. Thanks! J
Re: any code to load large data from web into Cassandra
I think Joanne is taking not about bulk loading, but about just general access as in any standard client driver. Joanne, this is a pretty broad topic. You would need to have some part of a website built in some language such as Python or Java or some other language. Then you would use an appropriate client driver for the programming language you used for the rest of your website. If you are just getting started with programming websites, I would start first with making one which doesn't use a database at all, and once you can submit a form and see the data which you submitted, then try to find a client driver for your language and insert that data into your database. A contact form is usually a good place to start as it is fairly simple. On Sat, Dec 27, 2014, 8:11 AM Keith Sterling keith.sterl...@first-utility.com wrote: Check out this datastax article http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html And code examples can be found here https://github.com/PatrickCallaghan/datastax-bulkloader-writer-example You can write a writer in scala or Java which will convert csv et into ss tables and then use sstableloader to load direct into Cassandra K -- *Keith Sterling* *Head of Software* *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com *P:* +44 7771 597 630 *W:* first-utility.com http://www.first-utility.com/ *A:* Opus 40 Business Park, Haywood Road, Warwick CV34 5AH On Sat, Dec 27, 2014 at 1:11 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Sorry, but you are still not being clear. In particular, website data has no common, defined meaning. You'll need to use some standard, defined terminology or specific examples so that we can have some idea what you are referring to. The blog post you cited is referring to the Twitter API, presumably to read tweets. Okay, fine, but you'll have to be more specific about what you want to do with them. Yes, Cassandra is primarily focus on structured data, but you can of course store unstructured and semi-structured data as blobs, JSON strings, map columns, etc. Please describe in a little more detail what problem you are trying to solve. I mean, website data might mean any data (in any format) stored at a web URL, which might be a web page, a data file linked by a web page, or... it could be a REST API like Twitter). Or it could be... whatever. Cassandra is basically a storage engine - it can store anything. There are a wide variety of tools that can be used to ingest data from the infinite variety of sources for data. But you'll need to state more specifically what you are actually tring to accomplish. Also, large data could be... anything, like Big Data. So more specificity is needed. Alternatively, you could hire a consultant to help guide you through the application analysis process to determine your application requirements, and then you could simply post your application requirements, or at least a concise summary or relevant excerpt. -- Jack Krupansky -- Jack Krupansky On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact joannenetw...@gmail.com wrote: Thank you. I did not express clearly on my question. I wonder if there is sample code to load any website data to Cassandra? Say, this webpage http://datatomix.com/?p=84 seems to use Python, tweepy, to use twitter API to get data in json format and then load data into Cassandra. So it seems tweepy is special for twitter API. Is there a code for any website? Btw I am not familiar with Python yet. So the answer may not be limited to Python. Thanks! On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling keith.sterl...@first-utility.com wrote: Take a look at sstableloader. We use it to load 30+m rows into Cassandra Datastax documentation is a good staty -- *Keith Sterling* *Head of Software* *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com *P:* +44 7771 597 630 *W:* first-utility.com http://www.first-utility.com/ *A:* Opus 40 Business Park, Haywood Road, Warwick CV34 5AH On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact joannenetw...@gmail.com wrote: Hello I am new. Did not seem to find the answer after a brief research. Please help. Thanks! J
any code to load large data from web into Cassandra
Thank you. I did not express clearly on my question. I wonder if there is sample code to load any website data to Cassandra? Say, this webpage http://datatomix.com/?p=84 seems to use Python, tweepy, to use twitter API to get data in json format and then load data into Cassandra. So it seems tweepy is special for twitter API. Is there a code for any website? Btw I am not familiar with Python yet. So the answer may not be limited to Python. Thanks! On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling keith.sterl...@first-utility.com wrote: Take a look at sstableloader. We use it to load 30+m rows into Cassandra Datastax documentation is a good staty -- *Keith Sterling* *Head of Software* *E:* keith.sterl...@first-utility.com stephen.l...@first-utility.com *P:* +44 7771 597 630 *W:* first-utility.com http://www.first-utility.com/ *A:* Opus 40 Business Park, Haywood Road, Warwick CV34 5AH On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact joannenetw...@gmail.com wrote: Hello I am new. Did not seem to find the answer after a brief research. Please help. Thanks! J