Re: Brisk vs Cloudera Distribution

2012-02-08 Thread Edward Capriolo
Hadoop can work on a number of filessytems hdfs , s3. Local files. Brisk
file system is known as cfs. Cfs stores all block and meta data in
cassandra. Thus it does not use a name node. Brisk fires up a jobtracker
automatically as well. Brisk also has a hivemeta store backed by cassandra
so takes away that spof.

Brisk snappy compresses all data so you may not need to use compression or
sequence files. Performance wise I have gotten comparable numbers with tera
sort and tera gen. But the system work vastly differently and likely it
scales differently.

The hive integration is solid. Not sure what the biggest cluster is or
making other vague performance claims. Brisk is not active anymore the
commercial product is dse. There is a github fork of brisk however.

On Wednesday, February 8, 2012, rk vishu talk2had...@gmail.com wrote:
 Hello All,

 Could any one help me understand pros and cons of Brisk vs Cloudera Hadoop
 (DHFS + HBASE) in terms of functionality and performance?
 Wanted to keep aside the single point of failure (NN) issue while
comparing?
 Are there any big clusters in petabytes using brisk in production? How is
 the performance comparision CFS vs HDFS? How is Hive integration?

 Thanks and Regrds
 RK



Re: Brisk vs Cloudera Distribution

2012-02-08 Thread rk vishu
Thank you for the information.

On Wed, Feb 8, 2012 at 8:57 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Hadoop can work on a number of filessytems hdfs , s3. Local files. Brisk
 file system is known as cfs. Cfs stores all block and meta data in
 cassandra. Thus it does not use a name node. Brisk fires up a jobtracker
 automatically as well. Brisk also has a hivemeta store backed by cassandra
 so takes away that spof.

 Brisk snappy compresses all data so you may not need to use compression or
 sequence files. Performance wise I have gotten comparable numbers with tera
 sort and tera gen. But the system work vastly differently and likely it
 scales differently.

 The hive integration is solid. Not sure what the biggest cluster is or
 making other vague performance claims. Brisk is not active anymore the
 commercial product is dse. There is a github fork of brisk however.


 On Wednesday, February 8, 2012, rk vishu talk2had...@gmail.com wrote:
  Hello All,
 
  Could any one help me understand pros and cons of Brisk vs Cloudera
 Hadoop
  (DHFS + HBASE) in terms of functionality and performance?
  Wanted to keep aside the single point of failure (NN) issue while
 comparing?
  Are there any big clusters in petabytes using brisk in production? How is
  the performance comparision CFS vs HDFS? How is Hive integration?
 
  Thanks and Regrds
  RK
 



Brisk vs Cloudera Distribution

2012-02-07 Thread rk vishu
Hello All,

Could any one help me understand pros and cons of Brisk vs Cloudera Hadoop
(DHFS + HBASE) in terms of functionality and performance?
Wanted to keep aside the single point of failure (NN) issue while comparing?
Are there any big clusters in petabytes using brisk in production? How is
the performance comparision CFS vs HDFS? How is Hive integration?

Thanks and Regrds
RK