Re: Tool for SQL -> Cassandra data movement

2011-11-02 Thread Brian O'Neill

COTs/Open-Source ETL tools exist to do this.   (Talend, Pentaho, CloverETL,
etc.)
With those, you should be able to do this without writing any code.

All of the tools can read from a SQL database.  Then you just need to push
the data into Cassandra.   Many of the ETL tools support web services, which
is why I suggested a REST layer for Cassandra might be handy.  Using the ETL
tool, you could push the data into Cassandra as JSON over REST.  (If you
want, give Virgil <http://code.google.com/a/apache-extras.org/p/virgil/>  a
try)  

I haven't tried, but you might also be able to coax the ETL tools to use
CQL.  

Some of the ETL tools are Map/Reduce friendly (more or less) and can
distribute the job over a cluster.  But if you have a lot of data, you may
also want to look at Pig and/or Map/Reduce directly.   If you stage the
CSV/JSON file on HDFS, then a simple Map/Reduce job can load the data
directly into Cassandra. (using a ColumnFamilyOutput format)

We are solving this problem right now, so I'll report back.

-brian

 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



From:  Maxim Potekhin 
Organization:  Brookhaven National Laboratory
Reply-To:  
Date:  Tue, 01 Nov 2011 14:18:00 -0400
To:  
Subject:  Re: Tool for SQL -> Cassandra data movement


 Just a short comment -- we are going the CSV way as well because of its
compactness and extreme portability.
 The CSV files are kept in the cloud as backup. They can also find other
uses. JSON would work as well, but
 it would be at least twice as large in size.
 
 Maxim
 
 On 9/22/2011 1:25 PM, Nehal Mehta wrote:
> We are trying to carry out same stuff, but instead of migrating into JSON, we
> are exporting into CSV and than importing CSV into Cassandra.  Which DB are
> you currently using?
>  
>  Thanks,
>  Nehal Mehta. 
>  
>  
>  2011/9/22 Radim Kolar 
>  
>> I need tool which is able to dump tables via JDBC into JSON format for
>> cassandra import. I am pretty sure that somebody already wrote that.
>>  
>>  Are there tools which can do direct JDBC -> cassandra import?
>>  
>  
>  
>  
 
 




Re: Tool for SQL -> Cassandra data movement

2011-11-01 Thread Maxim Potekhin
Just a short comment -- we are going the CSV way as well because of its 
compactness and extreme portability.
The CSV files are kept in the cloud as backup. They can also find other 
uses. JSON would work as well, but

it would be at least twice as large in size.

Maxim

On 9/22/2011 1:25 PM, Nehal Mehta wrote:
We are trying to carry out same stuff, but instead of migrating into 
JSON, we are exporting into CSV and than importing CSV into 
Cassandra.  Which DB are you currently using?


Thanks,
Nehal Mehta.

2011/9/22 Radim Kolar mailto:h...@sendmail.cz>>

I need tool which is able to dump tables via JDBC into JSON format
for cassandra import. I am pretty sure that somebody already wrote
that.

Are there tools which can do direct JDBC -> cassandra import?






Re: Tool for SQL -> Cassandra data movement

2011-09-27 Thread Nehal Mehta
Hi,

Instead of passing it as command line argument, I am storing all of this
configuration in config/config.xml.

My earlier version was command line, but than as arguments increased I
shifted to config.xml. Plus I thought providing all credentials at command
line is also not a good idea. Sample Config file is
https://github.com/nehalmehta/CSV2Cassandra/blob/master/config/config.xml.

I am going to add following features: Cassandra Credentials, Selected
Columns and selected primary key. I believe it is good idea to have function
calls , which can manipulate selected csv columns before inserting records.

Thanks,
Nehal Mehta.
On Tue, Sep 27, 2011 at 8:03 PM, Radim Kolar  wrote:

> > I have cleaned up my code that imports CSV into Cassandra and I have put
> it open on 
> https://github.com/nehalmehta/**CSV2Cassandra.
> Have a look if it is useful to you.
> Hello,
>  I will remake this tool into something which is like Oracle SQL*Loader.
> Basically, you will pass controlfile as command line argument. I need
> conversion from DATE to milliseconds based date, header less CSV and better
> CSV escaping.
>
> example of control file
>
> options (rows=1000)
> LOAD DATA
>  INFILE  'c:\tmp\searches.csv'
>  BADFILE 'c:\tmp\searches.bad'
>  REPLACE
>  INTO TABLE SEARCHES2
>  FIELDS TERMINATED BY ","
>  OPTIONALLY ENCLOSED BY '"'
>  (  query,
> day date '-MM-DD',
> results,
> ip
>   )
>
> or maybe i will start project from 0
>


Re: Tool for SQL -> Cassandra data movement

2011-09-27 Thread Radim Kolar
> I have cleaned up my code that imports CSV into Cassandra and I have 
put it open on https://github.com/nehalmehta/CSV2Cassandra. Have a look 
if it is useful to you.

Hello,
 I will remake this tool into something which is like Oracle 
SQL*Loader. Basically, you will pass controlfile as command line 
argument. I need conversion from DATE to milliseconds based date, header 
less CSV and better CSV escaping.


example of control file

options (rows=1000)
LOAD DATA
  INFILE  'c:\tmp\searches.csv'
  BADFILE 'c:\tmp\searches.bad'
  REPLACE
  INTO TABLE SEARCHES2
  FIELDS TERMINATED BY ","
  OPTIONALLY ENCLOSED BY '"'
  (  query,
 day date '-MM-DD',
 results,
 ip
   )

or maybe i will start project from 0


Re: Tool for SQL -> Cassandra data movement

2011-09-22 Thread Nehal Mehta
Hi Ramdin,

I have cleaned up my code that imports CSV into Cassandra and I have put it
open on https://github.com/nehalmehta/CSV2Cassandra. Have a look if it is
useful to you.

I have used Hector instead of sstableloader. For me it was necessary to have
consistency level of EACH_QUORUM.

Thanks,
Nehal Mehta.

On Thu, Sep 22, 2011 at 11:22 PM, Radim Kolar  wrote:

> Dne 22.9.2011 19:25, Nehal Mehta napsal(a):
>
>  We are trying to carry out same stuff, but instead of migrating into JSON,
>> we are exporting into CSV and than importing CSV into Cassandra.
>>
> You are right CSV seems to be more portable
>
>
>  Which DB are you currently using?
>>
> Postgresql and Apache Derby.
>


Re: Tool for SQL -> Cassandra data movement

2011-09-22 Thread Radim Kolar

Dne 22.9.2011 19:25, Nehal Mehta napsal(a):
We are trying to carry out same stuff, but instead of migrating into 
JSON, we are exporting into CSV and than importing CSV into Cassandra. 

You are right CSV seems to be more portable


Which DB are you currently using?

Postgresql and Apache Derby.


Re: Tool for SQL -> Cassandra data movement

2011-09-22 Thread Jeremy Hanna
Take a look at http://www.datastax.com/dev/blog/bulk-loading

I'm sure there is a way to make it more seamless for what you want to do and it 
could be built on, but the recent bulk loading additions will provide the best 
foundation.

On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote:

> We are trying to carry out same stuff, but instead of migrating into JSON, we 
> are exporting into CSV and than importing CSV into Cassandra.  Which DB are 
> you currently using? 
> 
> Thanks,
> Nehal Mehta. 
> 
> 2011/9/22 Radim Kolar 
> I need tool which is able to dump tables via JDBC into JSON format for 
> cassandra import. I am pretty sure that somebody already wrote that.
> 
> Are there tools which can do direct JDBC -> cassandra import?
> 



Re: Tool for SQL -> Cassandra data movement

2011-09-22 Thread Nehal Mehta
We are trying to carry out same stuff, but instead of migrating into JSON,
we are exporting into CSV and than importing CSV into Cassandra.  Which DB
are you currently using?

Thanks,
Nehal Mehta.

2011/9/22 Radim Kolar 

> I need tool which is able to dump tables via JDBC into JSON format for
> cassandra import. I am pretty sure that somebody already wrote that.
>
> Are there tools which can do direct JDBC -> cassandra import?
>