HDFS disk space requirement

2013-01-10 Thread Panshul Whisper
Hello,

I have a hadoop cluster of 5 nodes with a total of available HDFS space 130
GB with replication set to 5.
I have a file of 115 GB, which needs to be copied to the HDFS and processed.
Do I need to have anymore HDFS space for performing all processing without
running into any problems? or is this space sufficient?

-- 
Regards,
Ouch Whisper
010101010101


Re: HDFS disk space requirement

2013-01-10 Thread பாலாஜி நாராயணன்
If the replication factor is   5 you will need at least 5x the space if the
file. So this is not going tobe enough.

On Thursday, January 10, 2013, Panshul Whisper wrote:

 Hello,

 I have a hadoop cluster of 5 nodes with a total of available HDFS space
 130 GB with replication set to 5.
 I have a file of 115 GB, which needs to be copied to the HDFS and
 processed.
 Do I need to have anymore HDFS space for performing all processing without
 running into any problems? or is this space sufficient?

 --
 Regards,
 Ouch Whisper
 010101010101



-- 
http://balajin.net/blog
http://flic.kr/balajijegan


Re: HDFS disk space requirement

2013-01-10 Thread Ravi Mutyala
If the file is a txt file, you could get a good compression ratio. Changing
the replication to 3 and the file will fit. But not sure what your usecase
is what you want to achieve by putting this data there. Any transformation
on this data and you would need more space to save the transformed data.

If you have 5 nodes and they are not virtual machines, you should consider
adding more harddisks to your cluster.


On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper ouchwhis...@gmail.comwrote:

 Hello,

 I have a hadoop cluster of 5 nodes with a total of available HDFS space
 130 GB with replication set to 5.
 I have a file of 115 GB, which needs to be copied to the HDFS and
 processed.
 Do I need to have anymore HDFS space for performing all processing without
 running into any problems? or is this space sufficient?

 --
 Regards,
 Ouch Whisper
 010101010101



Re: HDFS disk space requirement

2013-01-10 Thread Panshul Whisper
Thank you for the response.

Actually it is not a single file, I have JSON files that amount to 115 GB,
these JSON files need to be processed and loaded into a Hbase data tables
on the same cluster for later processing. Not considering the disk space
required for the Hbase storage, If I reduce the replication to 3, how much
more HDFS space will I require?

Thank you,


On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala r...@hortonworks.com wrote:

 If the file is a txt file, you could get a good compression ratio.
 Changing the replication to 3 and the file will fit. But not sure what your
 usecase is what you want to achieve by putting this data there. Any
 transformation on this data and you would need more space to save the
 transformed data.

 If you have 5 nodes and they are not virtual machines, you should consider
 adding more harddisks to your cluster.


 On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper ouchwhis...@gmail.comwrote:

 Hello,

 I have a hadoop cluster of 5 nodes with a total of available HDFS space
 130 GB with replication set to 5.
 I have a file of 115 GB, which needs to be copied to the HDFS and
 processed.
 Do I need to have anymore HDFS space for performing all processing
 without running into any problems? or is this space sufficient?

 --
 Regards,
 Ouch Whisper
 010101010101





-- 
Regards,
Ouch Whisper
010101010101


Re: HDFS disk space requirement

2013-01-10 Thread Alexander Pivovarov
finish elementary school first. (plus, minus operations at least)


On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote:

 Thank you for the response.

 Actually it is not a single file, I have JSON files that amount to 115 GB,
 these JSON files need to be processed and loaded into a Hbase data tables
 on the same cluster for later processing. Not considering the disk space
 required for the Hbase storage, If I reduce the replication to 3, how much
 more HDFS space will I require?

 Thank you,


 On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala r...@hortonworks.comwrote:

 If the file is a txt file, you could get a good compression ratio.
 Changing the replication to 3 and the file will fit. But not sure what your
 usecase is what you want to achieve by putting this data there. Any
 transformation on this data and you would need more space to save the
 transformed data.

 If you have 5 nodes and they are not virtual machines, you should
 consider adding more harddisks to your cluster.


 On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper 
 ouchwhis...@gmail.comwrote:

 Hello,

 I have a hadoop cluster of 5 nodes with a total of available HDFS space
 130 GB with replication set to 5.
 I have a file of 115 GB, which needs to be copied to the HDFS and
 processed.
 Do I need to have anymore HDFS space for performing all processing
 without running into any problems? or is this space sufficient?

 --
 Regards,
 Ouch Whisper
 010101010101





 --
 Regards,
 Ouch Whisper
 010101010101



Re: HDFS disk space requirement

2013-01-10 Thread shashwat shriparv
115 * 5 = 575 Minimum GB you need, keep in mind on minimal, and you will
have other disk space needs too...



∞
Shashwat Shriparv



On Fri, Jan 11, 2013 at 11:19 AM, Alexander Pivovarov
apivova...@gmail.comwrote:

 finish elementary school first. (plus, minus operations at least)


 On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote:

 Thank you for the response.

 Actually it is not a single file, I have JSON files that amount to 115
 GB, these JSON files need to be processed and loaded into a Hbase data
 tables on the same cluster for later processing. Not considering the disk
 space required for the Hbase storage, If I reduce the replication to 3, how
 much more HDFS space will I require?

 Thank you,


 On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala r...@hortonworks.comwrote:

 If the file is a txt file, you could get a good compression ratio.
 Changing the replication to 3 and the file will fit. But not sure what your
 usecase is what you want to achieve by putting this data there. Any
 transformation on this data and you would need more space to save the
 transformed data.

 If you have 5 nodes and they are not virtual machines, you should
 consider adding more harddisks to your cluster.


 On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper 
 ouchwhis...@gmail.comwrote:

 Hello,

 I have a hadoop cluster of 5 nodes with a total of available HDFS space
 130 GB with replication set to 5.
 I have a file of 115 GB, which needs to be copied to the HDFS and
 processed.
 Do I need to have anymore HDFS space for performing all processing
 without running into any problems? or is this space sufficient?

 --
 Regards,
 Ouch Whisper
 010101010101





 --
 Regards,
 Ouch Whisper
 010101010101