HDFS disk space requirement
Hello, I have a hadoop cluster of 5 nodes with a total of available HDFS space 130 GB with replication set to 5. I have a file of 115 GB, which needs to be copied to the HDFS and processed. Do I need to have anymore HDFS space for performing all processing without running into any problems? or is this space sufficient? -- Regards, Ouch Whisper 010101010101
Re: HDFS disk space requirement
If the replication factor is 5 you will need at least 5x the space if the file. So this is not going tobe enough. On Thursday, January 10, 2013, Panshul Whisper wrote: Hello, I have a hadoop cluster of 5 nodes with a total of available HDFS space 130 GB with replication set to 5. I have a file of 115 GB, which needs to be copied to the HDFS and processed. Do I need to have anymore HDFS space for performing all processing without running into any problems? or is this space sufficient? -- Regards, Ouch Whisper 010101010101 -- http://balajin.net/blog http://flic.kr/balajijegan
Re: HDFS disk space requirement
If the file is a txt file, you could get a good compression ratio. Changing the replication to 3 and the file will fit. But not sure what your usecase is what you want to achieve by putting this data there. Any transformation on this data and you would need more space to save the transformed data. If you have 5 nodes and they are not virtual machines, you should consider adding more harddisks to your cluster. On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Hello, I have a hadoop cluster of 5 nodes with a total of available HDFS space 130 GB with replication set to 5. I have a file of 115 GB, which needs to be copied to the HDFS and processed. Do I need to have anymore HDFS space for performing all processing without running into any problems? or is this space sufficient? -- Regards, Ouch Whisper 010101010101
Re: HDFS disk space requirement
Thank you for the response. Actually it is not a single file, I have JSON files that amount to 115 GB, these JSON files need to be processed and loaded into a Hbase data tables on the same cluster for later processing. Not considering the disk space required for the Hbase storage, If I reduce the replication to 3, how much more HDFS space will I require? Thank you, On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala r...@hortonworks.com wrote: If the file is a txt file, you could get a good compression ratio. Changing the replication to 3 and the file will fit. But not sure what your usecase is what you want to achieve by putting this data there. Any transformation on this data and you would need more space to save the transformed data. If you have 5 nodes and they are not virtual machines, you should consider adding more harddisks to your cluster. On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Hello, I have a hadoop cluster of 5 nodes with a total of available HDFS space 130 GB with replication set to 5. I have a file of 115 GB, which needs to be copied to the HDFS and processed. Do I need to have anymore HDFS space for performing all processing without running into any problems? or is this space sufficient? -- Regards, Ouch Whisper 010101010101 -- Regards, Ouch Whisper 010101010101
Re: HDFS disk space requirement
finish elementary school first. (plus, minus operations at least) On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Thank you for the response. Actually it is not a single file, I have JSON files that amount to 115 GB, these JSON files need to be processed and loaded into a Hbase data tables on the same cluster for later processing. Not considering the disk space required for the Hbase storage, If I reduce the replication to 3, how much more HDFS space will I require? Thank you, On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala r...@hortonworks.comwrote: If the file is a txt file, you could get a good compression ratio. Changing the replication to 3 and the file will fit. But not sure what your usecase is what you want to achieve by putting this data there. Any transformation on this data and you would need more space to save the transformed data. If you have 5 nodes and they are not virtual machines, you should consider adding more harddisks to your cluster. On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Hello, I have a hadoop cluster of 5 nodes with a total of available HDFS space 130 GB with replication set to 5. I have a file of 115 GB, which needs to be copied to the HDFS and processed. Do I need to have anymore HDFS space for performing all processing without running into any problems? or is this space sufficient? -- Regards, Ouch Whisper 010101010101 -- Regards, Ouch Whisper 010101010101
Re: HDFS disk space requirement
115 * 5 = 575 Minimum GB you need, keep in mind on minimal, and you will have other disk space needs too... ∞ Shashwat Shriparv On Fri, Jan 11, 2013 at 11:19 AM, Alexander Pivovarov apivova...@gmail.comwrote: finish elementary school first. (plus, minus operations at least) On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Thank you for the response. Actually it is not a single file, I have JSON files that amount to 115 GB, these JSON files need to be processed and loaded into a Hbase data tables on the same cluster for later processing. Not considering the disk space required for the Hbase storage, If I reduce the replication to 3, how much more HDFS space will I require? Thank you, On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala r...@hortonworks.comwrote: If the file is a txt file, you could get a good compression ratio. Changing the replication to 3 and the file will fit. But not sure what your usecase is what you want to achieve by putting this data there. Any transformation on this data and you would need more space to save the transformed data. If you have 5 nodes and they are not virtual machines, you should consider adding more harddisks to your cluster. On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Hello, I have a hadoop cluster of 5 nodes with a total of available HDFS space 130 GB with replication set to 5. I have a file of 115 GB, which needs to be copied to the HDFS and processed. Do I need to have anymore HDFS space for performing all processing without running into any problems? or is this space sufficient? -- Regards, Ouch Whisper 010101010101 -- Regards, Ouch Whisper 010101010101