Re: HELP: I wanna store the output value into a list not write to the disk
It seems like the InMemoryFileSystem class has been deprecated in Hadoop 0.19.1. Why? I want to reuse the result of reduce as the next time map's input. Cascading does not work, because the data of each step is dependent. I set each timestep mapreduce job as synchronization. If the InMemoryFileSystem is deprecated. How can I reduce the I/O for each timestep's mapreduce job. 2009/4/2 Farhan Husain > Is there a way to implement some OutputCollector that can do what Andy > wants > to do? > > On Thu, Apr 2, 2009 at 10:21 AM, Rasit OZDAS wrote: > > > Andy, I didn't try this feature. But I know that Yahoo had a > > performance record with this file format. > > I came across a file system included in hadoop code (probably that > > one) when searching the source code. > > Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem > > But if you have a lot of big files, this approach won't be suitable I > > think. > > > > Maybe someone can give further info. > > > > 2009/4/2 andy2005cst : > > > > > > thanks for your reply. Let me explain more clearly, since Map Reduce is > > just > > > one step of my program, I need to use the output of reduce for furture > > > computation, so i do not need to want to wirte the output into disk, > but > > > wanna to get the collection or list of the output in RAM. if it > directly > > > wirtes into disk, I have to read it back into RAM again. > > > you have mentioned a special file format, will you please show me what > is > > > it? and give some example if possible. > > > > > > thank you so much. > > > > > > > > > Rasit OZDAS wrote: > > >> > > >> Hi, hadoop is normally designed to write to disk. There are a special > > file > > >> format, which writes output to RAM instead of disk. > > >> But I don't have an idea if it's what you're looking for. > > >> If what you said exists, there should be a mechanism which sends > output > > as > > >> objects rather than file content across computers, as far as I know > > there > > >> is > > >> no such feature yet. > > >> > > >> Good luck. > > >> > > >> 2009/4/2 andy2005cst > > >> > > >>> > > >>> I need to use the output of the reduce, but I don't know how to do. > > >>> use the wordcount program as an example if i want to collect the > > >>> wordcount > > >>> into a hashtable for further use, how can i do? > > >>> the example just show how to let the result onto disk. > > >>> myemail is : andy2005...@gmail.com > > >>> looking forward your help. thanks a lot. > > >>> -- > > >>> View this message in context: > > >>> > > > http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html > > >>> Sent from the Hadoop core-user mailing list archive at Nabble.com. > > >>> > > >>> > > >> > > >> > > >> -- > > >> M. Raşit ÖZDAŞ > > >> > > >> > > > > > > -- > > > View this message in context: > > > http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html > > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > > > > > > > > > > > -- > > M. Raşit ÖZDAŞ > > > > > > -- > Mohammad Farhan Husain > Research Assistant > Department of Computer Science > Erik Jonsson School of Engineering and Computer Science > University of Texas at Dallas > -- Chen He RCF CSE Dept. University of Nebraska-Lincoln US
Re: HELP: I wanna store the output value into a list not write to the disk
Is there a way to implement some OutputCollector that can do what Andy wants to do? On Thu, Apr 2, 2009 at 10:21 AM, Rasit OZDAS wrote: > Andy, I didn't try this feature. But I know that Yahoo had a > performance record with this file format. > I came across a file system included in hadoop code (probably that > one) when searching the source code. > Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem > But if you have a lot of big files, this approach won't be suitable I > think. > > Maybe someone can give further info. > > 2009/4/2 andy2005cst : > > > > thanks for your reply. Let me explain more clearly, since Map Reduce is > just > > one step of my program, I need to use the output of reduce for furture > > computation, so i do not need to want to wirte the output into disk, but > > wanna to get the collection or list of the output in RAM. if it directly > > wirtes into disk, I have to read it back into RAM again. > > you have mentioned a special file format, will you please show me what is > > it? and give some example if possible. > > > > thank you so much. > > > > > > Rasit OZDAS wrote: > >> > >> Hi, hadoop is normally designed to write to disk. There are a special > file > >> format, which writes output to RAM instead of disk. > >> But I don't have an idea if it's what you're looking for. > >> If what you said exists, there should be a mechanism which sends output > as > >> objects rather than file content across computers, as far as I know > there > >> is > >> no such feature yet. > >> > >> Good luck. > >> > >> 2009/4/2 andy2005cst > >> > >>> > >>> I need to use the output of the reduce, but I don't know how to do. > >>> use the wordcount program as an example if i want to collect the > >>> wordcount > >>> into a hashtable for further use, how can i do? > >>> the example just show how to let the result onto disk. > >>> myemail is : andy2005...@gmail.com > >>> looking forward your help. thanks a lot. > >>> -- > >>> View this message in context: > >>> > http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html > >>> Sent from the Hadoop core-user mailing list archive at Nabble.com. > >>> > >>> > >> > >> > >> -- > >> M. Raşit ÖZDAŞ > >> > >> > > > > -- > > View this message in context: > http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > > > > > -- > M. Raşit ÖZDAŞ > -- Mohammad Farhan Husain Research Assistant Department of Computer Science Erik Jonsson School of Engineering and Computer Science University of Texas at Dallas
Re: HELP: I wanna store the output value into a list not write to the disk
I don't really see what the downside of reading it from disk is. A list of word counts should be pretty small on disk so it shouldn't take long to read it into a HashMap. Doing anything else is going to cause you to go a long way out of your way to end up with the same result. -Bryan On Apr 2, 2009, at 2:41 AM, andy2005cst wrote: I need to use the output of the reduce, but I don't know how to do. use the wordcount program as an example if i want to collect the wordcount into a hashtable for further use, how can i do? the example just show how to let the result onto disk. myemail is : andy2005...@gmail.com looking forward your help. thanks a lot. -- View this message in context: http://www.nabble.com/HELP%3A-I-wanna- store-the-output-value-into-a-list-not-write-to-the-disk- tp22844277p22844277.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: HELP: I wanna store the output value into a list not write to the disk
That seems interesting, we have 3 replications as default. Is there a way to define, lets say, 1 replication for only job-specific files? 2009/4/2 Owen O'Malley : > > On Apr 2, 2009, at 2:41 AM, andy2005cst wrote: > >> >> I need to use the output of the reduce, but I don't know how to do. >> use the wordcount program as an example if i want to collect the wordcount >> into a hashtable for further use, how can i do? > > You can use an output format and then an input format that uses a database, > but in practice, the cost of writing to hdfs and reading it back is not a > problem, especially if you set the replication of the output files to 1. > (You'll need to re-run the job if you lose a node, but it will be fast.) > > -- Owen > -- M. Raşit ÖZDAŞ
Re: HELP: I wanna store the output value into a list not write to the disk
On Apr 2, 2009, at 2:41 AM, andy2005cst wrote: I need to use the output of the reduce, but I don't know how to do. use the wordcount program as an example if i want to collect the wordcount into a hashtable for further use, how can i do? You can use an output format and then an input format that uses a database, but in practice, the cost of writing to hdfs and reading it back is not a problem, especially if you set the replication of the output files to 1. (You'll need to re-run the job if you lose a node, but it will be fast.) -- Owen
Re: HELP: I wanna store the output value into a list not write to the disk
Andy, I didn't try this feature. But I know that Yahoo had a performance record with this file format. I came across a file system included in hadoop code (probably that one) when searching the source code. Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem But if you have a lot of big files, this approach won't be suitable I think. Maybe someone can give further info. 2009/4/2 andy2005cst : > > thanks for your reply. Let me explain more clearly, since Map Reduce is just > one step of my program, I need to use the output of reduce for furture > computation, so i do not need to want to wirte the output into disk, but > wanna to get the collection or list of the output in RAM. if it directly > wirtes into disk, I have to read it back into RAM again. > you have mentioned a special file format, will you please show me what is > it? and give some example if possible. > > thank you so much. > > > Rasit OZDAS wrote: >> >> Hi, hadoop is normally designed to write to disk. There are a special file >> format, which writes output to RAM instead of disk. >> But I don't have an idea if it's what you're looking for. >> If what you said exists, there should be a mechanism which sends output as >> objects rather than file content across computers, as far as I know there >> is >> no such feature yet. >> >> Good luck. >> >> 2009/4/2 andy2005cst >> >>> >>> I need to use the output of the reduce, but I don't know how to do. >>> use the wordcount program as an example if i want to collect the >>> wordcount >>> into a hashtable for further use, how can i do? >>> the example just show how to let the result onto disk. >>> myemail is : andy2005...@gmail.com >>> looking forward your help. thanks a lot. >>> -- >>> View this message in context: >>> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html >>> Sent from the Hadoop core-user mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> M. Raşit ÖZDAŞ >> >> > > -- > View this message in context: > http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- M. Raşit ÖZDAŞ
Re: HELP: I wanna store the output value into a list not write to the disk
thanks for your reply. Let me explain more clearly, since Map Reduce is just one step of my program, I need to use the output of reduce for furture computation, so i do not need to want to wirte the output into disk, but wanna to get the collection or list of the output in RAM. if it directly wirtes into disk, I have to read it back into RAM again. you have mentioned a special file format, will you please show me what is it? and give some example if possible. thank you so much. Rasit OZDAS wrote: > > Hi, hadoop is normally designed to write to disk. There are a special file > format, which writes output to RAM instead of disk. > But I don't have an idea if it's what you're looking for. > If what you said exists, there should be a mechanism which sends output as > objects rather than file content across computers, as far as I know there > is > no such feature yet. > > Good luck. > > 2009/4/2 andy2005cst > >> >> I need to use the output of the reduce, but I don't know how to do. >> use the wordcount program as an example if i want to collect the >> wordcount >> into a hashtable for further use, how can i do? >> the example just show how to let the result onto disk. >> myemail is : andy2005...@gmail.com >> looking forward your help. thanks a lot. >> -- >> View this message in context: >> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > > -- > M. Raşit ÖZDAŞ > > -- View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: HELP: I wanna store the output value into a list not write to the disk
Hi, hadoop is normally designed to write to disk. There are a special file format, which writes output to RAM instead of disk. But I don't have an idea if it's what you're looking for. If what you said exists, there should be a mechanism which sends output as objects rather than file content across computers, as far as I know there is no such feature yet. Good luck. 2009/4/2 andy2005cst > > I need to use the output of the reduce, but I don't know how to do. > use the wordcount program as an example if i want to collect the wordcount > into a hashtable for further use, how can i do? > the example just show how to let the result onto disk. > myemail is : andy2005...@gmail.com > looking forward your help. thanks a lot. > -- > View this message in context: > http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- M. Raşit ÖZDAŞ
HELP: I wanna store the output value into a list not write to the disk
I need to use the output of the reduce, but I don't know how to do. use the wordcount program as an example if i want to collect the wordcount into a hashtable for further use, how can i do? the example just show how to let the result onto disk. myemail is : andy2005...@gmail.com looking forward your help. thanks a lot. -- View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html Sent from the Hadoop core-user mailing list archive at Nabble.com.