Re: Memory mapped resources
There are systems for file-system plumbing out to user processes, and FUSE does this on Linux, and there is a package for hadoop. However- pretending a remote resource is local holds a place of honor on the system design antipattern hall of fame. On Wed, Apr 13, 2011 at 7:35 AM, Benson Margulies wrote: > Point taken. > > On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas wrote: >> Sorry, don't mean to say you don't know mmap or didn't do cool things in the >> past. >> But you will see why anyone would've interpreted this original post, given >> the title of the posting and the following wording, to mean "can I mmap >> files that are in hdfs" >> On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies >> wrote: >>> >>> We have some very large files that we access via memory mapping in >>> Java. Someone's asked us about how to make this conveniently >>> deployable in Hadoop. If we tell them to put the files into hdfs, can >>> we obtain a File for the underlying file on any given node? >> >> > -- Lance Norskog goks...@gmail.com
Re: Memory mapped resources
Point taken. On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas wrote: > Sorry, don't mean to say you don't know mmap or didn't do cool things in the > past. > But you will see why anyone would've interpreted this original post, given > the title of the posting and the following wording, to mean "can I mmap > files that are in hdfs" > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > wrote: >> >> We have some very large files that we access via memory mapping in >> Java. Someone's asked us about how to make this conveniently >> deployable in Hadoop. If we tell them to put the files into hdfs, can >> we obtain a File for the underlying file on any given node? > >
Re: Memory mapped resources
Sorry, don't mean to say you don't know mmap or didn't do cool things in the past. But you will see why anyone would've interpreted this original post, given the title of the posting and the following wording, to mean "can I mmap files that are in hdfs" On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies wrote: > We have some very large files that we access via memory mapping in > Java. Someone's asked us about how to make this conveniently > deployable in Hadoop. If we tell them to put the files into hdfs, can > we obtain a File for the underlying file on any given node? >
Re: Memory mapped resources
Guys, I'm not the one who said 'HDFS' unless I had a brain bubble in my original message. I asked for a distribution mechanism for code+mappable data. I appreciate the arrival of some suggestions. Ted is correct that I know quite a bit about mmap; I had a lot to do with the code in ObjectStore that ran a VM system in user space by mmaping files and handling SIGSEGV and the Windows moral equivalent. I am fully-aware that a 'simple ' HDFS solution involves hauling the entire file onto each node, contiguously, and a 'complex' one involves either an NFS server on top of HDFS or, for the truly enthusiastic, things in the kernel a-la ClearCase. On Wed, Apr 13, 2011 at 12:09 AM, Ted Dunning wrote: > Benson is actually a pretty sophisticated guy who knows a lot about mmap. > I engaged with him yesterday on this since I know him from Apache. > > On Tue, Apr 12, 2011 at 7:16 PM, M. C. Srivas wrote: >> >> I am not sure if you realize, but HDFS is not VM integrated.
Re: Memory mapped resources
On April 12, 2011 21:50:07 Luke Lu wrote: > You can use distributed cache for memory mapped files (they're local > to the node the tasks run on.) > > http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata We adopted this solution for a similar problem. For a program we developed each mapper needed to access (read-only) an index about 4 GB in size. We distributed the index to each node with the distributed cache, and then accessed it with mmap. The 4 GB are loaded into memory, but shared by all the map tasks on the same node. The mapper is written in C, so we can call mmap directly. In Java you may be able to get the same effect with java.nio.channels.FileChannel. Luca > On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies > > wrote: > > Here's the OP again. > > > > I want to make it clear that my question here has to do with the > > problem of distributing 'the program' around the cluster, not 'the > > data'. In the case at hand, the issue a system that has a large data > > resource that it needs to do its work. Every instance of the code > > needs the entire model. Not just some blocks or pieces. > > > > Memory mapping is a very attractive tactic for this kind of data > > resource. The data is read-only. Memory-mapping it allows the > > operating system to ensure that only one copy of the thing ends up in > > physical memory. > > > > If we force the model into a conventional file (storable in HDFS) and > > read it into the JVM in a conventional way, then we get as many copies > > in memory as we have JVMs. On a big machine with a lot of cores, this > > begins to add up. > > > > For people who are running a cluster of relatively conventional > > systems, just putting copies on all the nodes in a conventional place > > is adequate. -- Luca Pireddu CRS4 - Distributed Computing Group Loc. Pixina Manna Edificio 1 Pula 09010 (CA), Italy Tel: +39 0709250452
Re: Memory mapped resources
Benson is actually a pretty sophisticated guy who knows a lot about mmap. I engaged with him yesterday on this since I know him from Apache. On Tue, Apr 12, 2011 at 7:16 PM, M. C. Srivas wrote: > I am not sure if you realize, but HDFS is not VM integrated.
Re: Memory mapped resources
I am not sure if you realize, but HDFS is not VM integrated. What you are asking for is support *inside* the linux kernel for HDFS file systems. I don't see that happening for the next few years, and probably never at all. (HDFS is all Java today, and Java certainly is not going to go inside the kernel) The ways to get there are a) use the hdfs-fuse proxy b) do this by hand - copy the file into each individual machine's local disk, and then mmap the local path c) more or less do the same as (b), using a thing called the "Distributed Cache" in Hadoop, and them mmap the local path d) don't use HDFS, and instead use something else for this purpose On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies wrote: > Here's the OP again. > > I want to make it clear that my question here has to do with the > problem of distributing 'the program' around the cluster, not 'the > data'. In the case at hand, the issue a system that has a large data > resource that it needs to do its work. Every instance of the code > needs the entire model. Not just some blocks or pieces. > > Memory mapping is a very attractive tactic for this kind of data > resource. The data is read-only. Memory-mapping it allows the > operating system to ensure that only one copy of the thing ends up in > physical memory. > > If we force the model into a conventional file (storable in HDFS) and > read it into the JVM in a conventional way, then we get as many copies > in memory as we have JVMs. On a big machine with a lot of cores, this > begins to add up. > > For people who are running a cluster of relatively conventional > systems, just putting copies on all the nodes in a conventional place > is adequate. >
Re: Memory mapped resources
You can use distributed cache for memory mapped files (they're local to the node the tasks run on.) http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies wrote: > Here's the OP again. > > I want to make it clear that my question here has to do with the > problem of distributing 'the program' around the cluster, not 'the > data'. In the case at hand, the issue a system that has a large data > resource that it needs to do its work. Every instance of the code > needs the entire model. Not just some blocks or pieces. > > Memory mapping is a very attractive tactic for this kind of data > resource. The data is read-only. Memory-mapping it allows the > operating system to ensure that only one copy of the thing ends up in > physical memory. > > If we force the model into a conventional file (storable in HDFS) and > read it into the JVM in a conventional way, then we get as many copies > in memory as we have JVMs. On a big machine with a lot of cores, this > begins to add up. > > For people who are running a cluster of relatively conventional > systems, just putting copies on all the nodes in a conventional place > is adequate. >
Re: Memory mapped resources
Actually, it doesn't become trivial. It just becomes total fail or total win instead of almost always being partial win. It doesn't meet Benson's need. On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > To get around the chunks or blocks problem, I've been implementing a > system that simply sets a max block size that is too large for a file > to reach. In this way there will only be one block for HDFS file, and > so MMap'ing or other single file ops become trivial. > > On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies > wrote: > > Here's the OP again. > > > > I want to make it clear that my question here has to do with the > > problem of distributing 'the program' around the cluster, not 'the > > data'. In the case at hand, the issue a system that has a large data > > resource that it needs to do its work. Every instance of the code > > needs the entire model. Not just some blocks or pieces. > > > > Memory mapping is a very attractive tactic for this kind of data > > resource. The data is read-only. Memory-mapping it allows the > > operating system to ensure that only one copy of the thing ends up in > > physical memory. > > > > If we force the model into a conventional file (storable in HDFS) and > > read it into the JVM in a conventional way, then we get as many copies > > in memory as we have JVMs. On a big machine with a lot of cores, this > > begins to add up. > > > > For people who are running a cluster of relatively conventional > > systems, just putting copies on all the nodes in a conventional place > > is adequate. > > >
Re: Memory mapped resources
To get around the chunks or blocks problem, I've been implementing a system that simply sets a max block size that is too large for a file to reach. In this way there will only be one block for HDFS file, and so MMap'ing or other single file ops become trivial. On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies wrote: > Here's the OP again. > > I want to make it clear that my question here has to do with the > problem of distributing 'the program' around the cluster, not 'the > data'. In the case at hand, the issue a system that has a large data > resource that it needs to do its work. Every instance of the code > needs the entire model. Not just some blocks or pieces. > > Memory mapping is a very attractive tactic for this kind of data > resource. The data is read-only. Memory-mapping it allows the > operating system to ensure that only one copy of the thing ends up in > physical memory. > > If we force the model into a conventional file (storable in HDFS) and > read it into the JVM in a conventional way, then we get as many copies > in memory as we have JVMs. On a big machine with a lot of cores, this > begins to add up. > > For people who are running a cluster of relatively conventional > systems, just putting copies on all the nodes in a conventional place > is adequate. >
Re: Memory mapped resources
Here's the OP again. I want to make it clear that my question here has to do with the problem of distributing 'the program' around the cluster, not 'the data'. In the case at hand, the issue a system that has a large data resource that it needs to do its work. Every instance of the code needs the entire model. Not just some blocks or pieces. Memory mapping is a very attractive tactic for this kind of data resource. The data is read-only. Memory-mapping it allows the operating system to ensure that only one copy of the thing ends up in physical memory. If we force the model into a conventional file (storable in HDFS) and read it into the JVM in a conventional way, then we get as many copies in memory as we have JVMs. On a big machine with a lot of cores, this begins to add up. For people who are running a cluster of relatively conventional systems, just putting copies on all the nodes in a conventional place is adequate.
Re: Memory mapped resources
Blocks live where they land when first created. They can be moved due to node failure or rebalancing, but it is typically pretty expensive to do this. It certainly is slower than just reading the file. If you really, really want mmap to work, then you need to set up some native code that builds an mmap'ed region, but sets all pages to no access if the corresponding block is non-local and sets the block to access the local block if the block is local. Then you can intercept the segmentation violations that occur on page access to non-local data, read that block to local storage and mmap it into those pages. This is LOTS of work to get exactly right and must be done in C since Java can't really handle seg faults correctly. This pattern is fairly commonly used in garbage collected languages to allow magical remapping of memory without explicit tests. On Tue, Apr 12, 2011 at 8:24 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > > Interesting. I'm not familiar with how blocks go local, however I'm > interested in how to make this occur via a manual oriented call. Eg, > is there an option available that guarantees locality, and if not, > perhaps there's work being done towards that path?
Re: Memory mapped resources
> The others you will have to read more conventionally True. I think there are emergent use cases that demand data locality, eg, an optimized HBase system, search, and MMap'ing. > If all blocks are guaranteed local, this would work. I don't think that > guarantee is possible > on a non-trivial cluster Interesting. I'm not familiar with how blocks go local, however I'm interested in how to make this occur via a manual oriented call. Eg, is there an option available that guarantees locality, and if not, perhaps there's work being done towards that path? On Tue, Apr 12, 2011 at 8:08 AM, Ted Dunning wrote: > Well, no. > You could mmap all the blocks that are local to the node your program is on. > The others you will have to read more conventionally. If all blocks are > guaranteed local, this would work. I don't think that guarantee is possible > on a non-trivial cluster. > > On Tue, Apr 12, 2011 at 6:32 AM, Jason Rutherglen > wrote: >> >> Then one could MMap the blocks pertaining to the HDFS file and piece >> them together. Lucene's MMapDirectory implementation does just this >> to avoid an obscure JVM bug. >> >> On Mon, Apr 11, 2011 at 9:09 PM, Ted Dunning >> wrote: >> > Yes. But only one such block. That is what I meant by chunk. >> > That is fine if you want that chunk but if you want to mmap the entire >> > file, >> > it isn't real useful. >> > >> > On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen >> > wrote: >> >> >> >> What do you mean by local chunk? I think it's providing access to the >> >> underlying file block? >> >> >> >> On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning >> >> wrote: >> >> > Also, it only provides access to a local chunk of a file which isn't >> >> > very >> >> > useful. >> >> > >> >> > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo >> >> > >> >> > wrote: >> >> >> >> >> >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen >> >> >> wrote: >> >> >> > Yes you can however it will require customization of HDFS. Take a >> >> >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt >> >> >> > patch. >> >> >> > I have been altering it for use with HBASE-3529. Note that the >> >> >> > patch >> >> >> > noted is for the -append branch which is mainly for HBase. >> >> >> > >> >> >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies >> >> >> > wrote: >> >> >> >> We have some very large files that we access via memory mapping >> >> >> >> in >> >> >> >> Java. Someone's asked us about how to make this conveniently >> >> >> >> deployable in Hadoop. If we tell them to put the files into hdfs, >> >> >> >> can >> >> >> >> we obtain a File for the underlying file on any given node? >> >> >> >> >> >> >> > >> >> >> >> >> >> This features it not yet part of hadoop so doing this is not >> >> >> "convenient". >> >> > >> >> > >> > >> > > >
Re: Memory mapped resources
Well, no. You could mmap all the blocks that are local to the node your program is on. The others you will have to read more conventionally. If all blocks are guaranteed local, this would work. I don't think that guarantee is possible on a non-trivial cluster. On Tue, Apr 12, 2011 at 6:32 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Then one could MMap the blocks pertaining to the HDFS file and piece > them together. Lucene's MMapDirectory implementation does just this > to avoid an obscure JVM bug. > > On Mon, Apr 11, 2011 at 9:09 PM, Ted Dunning > wrote: > > Yes. But only one such block. That is what I meant by chunk. > > That is fine if you want that chunk but if you want to mmap the entire > file, > > it isn't real useful. > > > > On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen > > wrote: > >> > >> What do you mean by local chunk? I think it's providing access to the > >> underlying file block? > >> > >> On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning > >> wrote: > >> > Also, it only provides access to a local chunk of a file which isn't > >> > very > >> > useful. > >> > > >> > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo < > edlinuxg...@gmail.com> > >> > wrote: > >> >> > >> >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > >> >> wrote: > >> >> > Yes you can however it will require customization of HDFS. Take a > >> >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt > >> >> > patch. > >> >> > I have been altering it for use with HBASE-3529. Note that the > >> >> > patch > >> >> > noted is for the -append branch which is mainly for HBase. > >> >> > > >> >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > >> >> > wrote: > >> >> >> We have some very large files that we access via memory mapping in > >> >> >> Java. Someone's asked us about how to make this conveniently > >> >> >> deployable in Hadoop. If we tell them to put the files into hdfs, > >> >> >> can > >> >> >> we obtain a File for the underlying file on any given node? > >> >> >> > >> >> > > >> >> > >> >> This features it not yet part of hadoop so doing this is not > >> >> "convenient". > >> > > >> > > > > > >
Re: Memory mapped resources
Kevin, You present a good discussion of architectural alternatives here. But my comment really had more to do with whether a particular HDFS patch would provide what the original poster seemed to be asking about. This is especially pertinent since the patch was intended to scratch a different itch. On Tue, Apr 12, 2011 at 5:51 AM, wrote: > This is the age old argument of what to share in a partitioned > environment. IBM and Teradata have always used "shared nothing" which is > what only getting one chunk of the file in each hadoop node is doing. > Oracle has always used "shared disk" which is not an easy thing to do, > especially in scale, and seems to have varying results depending on > application, transaction or dss. Here are a couple of web references. > > http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Bhide88.html > > http://jhingran.typepad.com/anant_jhingrans_musings/2010/02/shared-nothi > ng-vs-shared-disks-the-cloud-sequel.html > > Rather than say shared nothing isn't useful, hadoop should look to how > others make this work. The two key problems to avoid are data skew where > one node sees to much data and becomes the slow node and large > intra-partition joins where large data is needed from more than one > partition and potentially gets copied around. > > Rather than hybriding into shared disk, I think hadoop should hybrid > into the shared data solutions others use, replication of select data, > for solving intra-partition joins in a "shared nothing" architecture. > This may be more database terminology that could be addressed by hbase, > but I think it is good background for the questions of memory mapping > files in hadoop. > > Kevin > > > -Original Message- > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Tuesday, April 12, 2011 12:09 AM > To: Jason Rutherglen > Cc: common-user@hadoop.apache.org; Edward Capriolo > Subject: Re: Memory mapped resources > > Yes. But only one such block. That is what I meant by chunk. > > That is fine if you want that chunk but if you want to mmap the entire > file, > it isn't real useful. > > On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > > > What do you mean by local chunk? I think it's providing access to the > > underlying file block? > > > > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning > > wrote: > > > Also, it only provides access to a local chunk of a file which isn't > very > > > useful. > > > > > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo > > > > wrote: > > >> > > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > > >> wrote: > > >> > Yes you can however it will require customization of HDFS. Take > a > > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt > patch. > > >> > I have been altering it for use with HBASE-3529. Note that the > patch > > >> > noted is for the -append branch which is mainly for HBase. > > >> > > > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > > >> > wrote: > > >> >> We have some very large files that we access via memory mapping > in > > >> >> Java. Someone's asked us about how to make this conveniently > > >> >> deployable in Hadoop. If we tell them to put the files into > hdfs, can > > >> >> we obtain a File for the underlying file on any given node? > > >> >> > > >> > > > >> > > >> This features it not yet part of hadoop so doing this is not > > "convenient". > > > > > > > > >
Re: Memory mapped resources
> We have some very large files that we access via memory mapping in > Java. Someone's asked us about how to make this conveniently > deployable in Hadoop. If we tell them to put the files into hdfs, can > we obtain a File for the underlying file on any given node? We sometimes find it convenient to have a small nfs share across the datanodes for this type of thing. Other times we find it convenient to just package up the data and submit it with the job so it can be addressed as a resource on the classpath. Depends on how large "very large" is as to which of those I would find most convenient.
Re: Memory mapped resources
Then one could MMap the blocks pertaining to the HDFS file and piece them together. Lucene's MMapDirectory implementation does just this to avoid an obscure JVM bug. On Mon, Apr 11, 2011 at 9:09 PM, Ted Dunning wrote: > Yes. But only one such block. That is what I meant by chunk. > That is fine if you want that chunk but if you want to mmap the entire file, > it isn't real useful. > > On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen > wrote: >> >> What do you mean by local chunk? I think it's providing access to the >> underlying file block? >> >> On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning >> wrote: >> > Also, it only provides access to a local chunk of a file which isn't >> > very >> > useful. >> > >> > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo >> > wrote: >> >> >> >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen >> >> wrote: >> >> > Yes you can however it will require customization of HDFS. Take a >> >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt >> >> > patch. >> >> > I have been altering it for use with HBASE-3529. Note that the >> >> > patch >> >> > noted is for the -append branch which is mainly for HBase. >> >> > >> >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies >> >> > wrote: >> >> >> We have some very large files that we access via memory mapping in >> >> >> Java. Someone's asked us about how to make this conveniently >> >> >> deployable in Hadoop. If we tell them to put the files into hdfs, >> >> >> can >> >> >> we obtain a File for the underlying file on any given node? >> >> >> >> >> > >> >> >> >> This features it not yet part of hadoop so doing this is not >> >> "convenient". >> > >> > > >
RE: Memory mapped resources
This is the age old argument of what to share in a partitioned environment. IBM and Teradata have always used "shared nothing" which is what only getting one chunk of the file in each hadoop node is doing. Oracle has always used "shared disk" which is not an easy thing to do, especially in scale, and seems to have varying results depending on application, transaction or dss. Here are a couple of web references. http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Bhide88.html http://jhingran.typepad.com/anant_jhingrans_musings/2010/02/shared-nothi ng-vs-shared-disks-the-cloud-sequel.html Rather than say shared nothing isn't useful, hadoop should look to how others make this work. The two key problems to avoid are data skew where one node sees to much data and becomes the slow node and large intra-partition joins where large data is needed from more than one partition and potentially gets copied around. Rather than hybriding into shared disk, I think hadoop should hybrid into the shared data solutions others use, replication of select data, for solving intra-partition joins in a "shared nothing" architecture. This may be more database terminology that could be addressed by hbase, but I think it is good background for the questions of memory mapping files in hadoop. Kevin -Original Message- From: Ted Dunning [mailto:tdunn...@maprtech.com] Sent: Tuesday, April 12, 2011 12:09 AM To: Jason Rutherglen Cc: common-user@hadoop.apache.org; Edward Capriolo Subject: Re: Memory mapped resources Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > What do you mean by local chunk? I think it's providing access to the > underlying file block? > > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning > wrote: > > Also, it only provides access to a local chunk of a file which isn't very > > useful. > > > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo > > wrote: > >> > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > >> wrote: > >> > Yes you can however it will require customization of HDFS. Take a > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. > >> > I have been altering it for use with HBASE-3529. Note that the patch > >> > noted is for the -append branch which is mainly for HBase. > >> > > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > >> > wrote: > >> >> We have some very large files that we access via memory mapping in > >> >> Java. Someone's asked us about how to make this conveniently > >> >> deployable in Hadoop. If we tell them to put the files into hdfs, can > >> >> we obtain a File for the underlying file on any given node? > >> >> > >> > > >> > >> This features it not yet part of hadoop so doing this is not > "convenient". > > > > >
Re: Memory mapped resources
Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > What do you mean by local chunk? I think it's providing access to the > underlying file block? > > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning > wrote: > > Also, it only provides access to a local chunk of a file which isn't very > > useful. > > > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo > > wrote: > >> > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > >> wrote: > >> > Yes you can however it will require customization of HDFS. Take a > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. > >> > I have been altering it for use with HBASE-3529. Note that the patch > >> > noted is for the -append branch which is mainly for HBase. > >> > > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > >> > wrote: > >> >> We have some very large files that we access via memory mapping in > >> >> Java. Someone's asked us about how to make this conveniently > >> >> deployable in Hadoop. If we tell them to put the files into hdfs, can > >> >> we obtain a File for the underlying file on any given node? > >> >> > >> > > >> > >> This features it not yet part of hadoop so doing this is not > "convenient". > > > > >
Re: Memory mapped resources
What do you mean by local chunk? I think it's providing access to the underlying file block? On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning wrote: > Also, it only provides access to a local chunk of a file which isn't very > useful. > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo > wrote: >> >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen >> wrote: >> > Yes you can however it will require customization of HDFS. Take a >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. >> > I have been altering it for use with HBASE-3529. Note that the patch >> > noted is for the -append branch which is mainly for HBase. >> > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies >> > wrote: >> >> We have some very large files that we access via memory mapping in >> >> Java. Someone's asked us about how to make this conveniently >> >> deployable in Hadoop. If we tell them to put the files into hdfs, can >> >> we obtain a File for the underlying file on any given node? >> >> >> > >> >> This features it not yet part of hadoop so doing this is not "convenient". > >
Re: Memory mapped resources
Also, it only provides access to a local chunk of a file which isn't very useful. On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo wrote: > On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > wrote: > > Yes you can however it will require customization of HDFS. Take a > > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. > > I have been altering it for use with HBASE-3529. Note that the patch > > noted is for the -append branch which is mainly for HBase. > > > > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > wrote: > >> We have some very large files that we access via memory mapping in > >> Java. Someone's asked us about how to make this conveniently > >> deployable in Hadoop. If we tell them to put the files into hdfs, can > >> we obtain a File for the underlying file on any given node? > >> > > > > This features it not yet part of hadoop so doing this is not "convenient". >
Re: Memory mapped resources
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen wrote: > Yes you can however it will require customization of HDFS. Take a > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. > I have been altering it for use with HBASE-3529. Note that the patch > noted is for the -append branch which is mainly for HBase. > > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > wrote: >> We have some very large files that we access via memory mapping in >> Java. Someone's asked us about how to make this conveniently >> deployable in Hadoop. If we tell them to put the files into hdfs, can >> we obtain a File for the underlying file on any given node? >> > This features it not yet part of hadoop so doing this is not "convenient".
Re: Memory mapped resources
Yes you can however it will require customization of HDFS. Take a look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. I have been altering it for use with HBASE-3529. Note that the patch noted is for the -append branch which is mainly for HBase. On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies wrote: > We have some very large files that we access via memory mapping in > Java. Someone's asked us about how to make this conveniently > deployable in Hadoop. If we tell them to put the files into hdfs, can > we obtain a File for the underlying file on any given node? >
Memory mapped resources
We have some very large files that we access via memory mapping in Java. Someone's asked us about how to make this conveniently deployable in Hadoop. If we tell them to put the files into hdfs, can we obtain a File for the underlying file on any given node?