sorting with sstableloader
Hi, I have a table with schema CREATE TABLE TEST_TABLE ( keyCol bigint, col1 bigint, col2 bigint, col3 text, ) WITH CLUSTERING ORDER BY (col1 DESC, col2 DESC) on cassandra 1.2.13 I used SSTableSimpleUnsortedWriter and sstableloader to load some data and loaded data for a keycolumn keyCol1 select * from test_table returned keyCol col1 col2col3 keyCol1101 abcde keyCol1 9 1 adfsa I again created some sstables and loaded using sstableloader which has records for keyCol1, but with col1 value as 20 which is keyCol col1 col2col3 keyCol120 1 afd So when I queried select * from test_table I was expecting keyCol120 1 afd keyCol1101 abcde keyCol1 9 1 adfsa but it returned keyCol1101 abcde keyCol1 9 1 adfsa keyCol120 1 afd How to fix this sorting, the col1 of value 20 was inserted later using sstableloader so it is showing up as the last row. Is there anyway to rewrite sstables to fix this sorting, or do I need to run anything after running sstable loader to fix this sorting. Thanks Varun
SSTableloader
Hi, I am trying to load using SSTableloader with cassandra 1.2 version like a million records. It streams very fast, but in the end its streaming gets stuck at two three machines in the cluster, rest all are 100% done. Has anybody seen such a problem and is there any tool I can use to diagnose this loading. Thanks in advance. Varun
Re: Bulkoutputformat
Thanks Rahul..article was insightful On Fri, Dec 13, 2013 at 12:25 AM, Rahul Menon wrote: > Here you go > > http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html > > > On Fri, Dec 13, 2013 at 7:19 AM, varun allampalli < > vshoori.off...@gmail.com> wrote: > >> Hi Aaron, >> >> It seems like you answered the question here. >> >> https://groups.google.com/forum/#!topic/nosql-databases/vjZA5vdycWA >> >> Can you give me the link to the blog which you mentioned >> >> http://thelastpickle.com/2013/01/11/primary-keys-in-cql/ >> >> Thanks in advance >> Varun >> >> >> On Thu, Dec 12, 2013 at 3:36 PM, varun allampalli < >> vshoori.off...@gmail.com> wrote: >> >>> Thanks Aaron, I was able to generate sstables and load using >>> sstableloader. But after loading the tables when I do a select query I get >>> this, the table has only one record. Is there anything I am missing or any >>> logs I can look at. >>> >>> Request did not complete within rpc_timeout. >>> >>> >>> On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton >>> wrote: >>> >>>> If you don’t need to use Hadoop then try the SSTableSimpleWriter and >>>> sstableloader , this post is a little old but still relevant >>>> http://www.datastax.com/dev/blog/bulk-loading >>>> >>>> Otherwise AFAIK BulkOutputFormat is what you want from hadoop >>>> http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration >>>> >>>> Cheers >>>> >>>> - >>>> Aaron Morton >>>> New Zealand >>>> @aaronmorton >>>> >>>> Co-Founder & Principal Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>>> On 12/12/2013, at 11:27 am, varun allampalli >>>> wrote: >>>> >>>> Hi All, >>>> >>>> I want to bulk insert data into cassandra. I was wondering of using >>>> BulkOutputformat in hadoop. Is it the best way or using driver and doing >>>> batch insert is the better way. >>>> >>>> Are there any disandvantages of using bulkoutputformat. >>>> >>>> Thanks for helping >>>> >>>> Varun >>>> >>>> >>>> >>> >> >
Re: Bulkoutputformat
Hi Aaron, It seems like you answered the question here. https://groups.google.com/forum/#!topic/nosql-databases/vjZA5vdycWA Can you give me the link to the blog which you mentioned http://thelastpickle.com/2013/01/11/primary-keys-in-cql/ Thanks in advance Varun On Thu, Dec 12, 2013 at 3:36 PM, varun allampalli wrote: > Thanks Aaron, I was able to generate sstables and load using > sstableloader. But after loading the tables when I do a select query I get > this, the table has only one record. Is there anything I am missing or any > logs I can look at. > > Request did not complete within rpc_timeout. > > > On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton wrote: > >> If you don’t need to use Hadoop then try the SSTableSimpleWriter and >> sstableloader , this post is a little old but still relevant >> http://www.datastax.com/dev/blog/bulk-loading >> >> Otherwise AFAIK BulkOutputFormat is what you want from hadoop >> http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration >> >> Cheers >> >> - >> Aaron Morton >> New Zealand >> @aaronmorton >> >> Co-Founder & Principal Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> On 12/12/2013, at 11:27 am, varun allampalli >> wrote: >> >> Hi All, >> >> I want to bulk insert data into cassandra. I was wondering of using >> BulkOutputformat in hadoop. Is it the best way or using driver and doing >> batch insert is the better way. >> >> Are there any disandvantages of using bulkoutputformat. >> >> Thanks for helping >> >> Varun >> >> >> >
Re: Bulkoutputformat
Thanks Aaron, I was able to generate sstables and load using sstableloader. But after loading the tables when I do a select query I get this, the table has only one record. Is there anything I am missing or any logs I can look at. Request did not complete within rpc_timeout. On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton wrote: > If you don’t need to use Hadoop then try the SSTableSimpleWriter and > sstableloader , this post is a little old but still relevant > http://www.datastax.com/dev/blog/bulk-loading > > Otherwise AFAIK BulkOutputFormat is what you want from hadoop > http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration > > Cheers > > - > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 12/12/2013, at 11:27 am, varun allampalli > wrote: > > Hi All, > > I want to bulk insert data into cassandra. I was wondering of using > BulkOutputformat in hadoop. Is it the best way or using driver and doing > batch insert is the better way. > > Are there any disandvantages of using bulkoutputformat. > > Thanks for helping > > Varun > > >
Bulkoutputformat
Hi All, I want to bulk insert data into cassandra. I was wondering of using BulkOutputformat in hadoop. Is it the best way or using driver and doing batch insert is the better way. Are there any disandvantages of using bulkoutputformat. Thanks for helping Varun