Re: What should the open file limit be for hbase
Hey Marc: You are still seeing 'too many open files'? Whats your schema look like. I added to http://wiki.apache.org/hadoop/Hbase/FAQ#5 a rough formula for counting how many open mapfiles in a running regionserver. Currently, your only recourse is upping the ulimit. Addressing this scaling barrier will be a focus of next hbase release. St.Ack Marc Harris wrote: I have seen that hbase can cause too many open file errors. I increase my limit to 10240 (10 times the previous limit) but still get errors. Is there a recommended value that I should set my open files limit to? Is there something else I can do to reduce the number of files, perhaps with some other trade-off? Thanks - Marc
What should the open file limit be for hbase
I have seen that hbase can cause too many open file errors. I increase my limit to 10240 (10 times the previous limit) but still get errors. Is there a recommended value that I should set my open files limit to? Is there something else I can do to reduce the number of files, perhaps with some other trade-off? Thanks - Marc
Re: What should the open file limit be for hbase
Sorry, I should have said 4 families in one table, obviously, not 4 regions in one table. - Marc On Mon, 2008-01-28 at 11:57 -0500, Marc Harris wrote: My schema is very simple: 4 regions in one table. create table pagefetch ( info MAX_VERSIONS=1, data MAX_VERSIONS=1, headers MAX_VERSIONS=1, redirects MAX_VERSIONS=1 ); I am running hadoop in distributed configuration, but with only one data node. I am running hbase with two region servers (one of which is one the same machine as hadoop). I am seeing the exceptions in my datanode log file, by the way, not my regionserver log file lsof -p REGIONSERVER_PID | wc -l gave 479 lsof -p DATANODE_PID | wc -l gave 10287 - Marc On Mon, 2008-01-28 at 08:13 -0800, stack wrote: Hey Marc: You are still seeing 'too many open files'? Whats your schema look like. I added to http://wiki.apache.org/hadoop/Hbase/FAQ#5 a rough formula for counting how many open mapfiles in a running regionserver. Currently, your only recourse is upping the ulimit. Addressing this scaling barrier will be a focus of next hbase release. St.Ack Marc Harris wrote: I have seen that hbase can cause too many open file errors. I increase my limit to 10240 (10 times the previous limit) but still get errors. Is there a recommended value that I should set my open files limit to? Is there something else I can do to reduce the number of files, perhaps with some other trade-off? Thanks - Marc
hadoop 0.15.3 r612257 freezes on reduce task
hi, i setup a cluster with 3 slaves and 1 master. when running the wordcount example, hadoop execution halts after the mapping phase. It has started the reduce phase and generally gets to 10-20% but halts there. I left it running for 30 minutes, no change. Log files don't show any errors... I reformatted the Hadoop FS that i created, access to the files from all nodes seems fine. Has anyone seen a similar behavior? Thanks, Florian
Re: distributed cache
jerrro wrote: Hello, Is there a way to use Distributed Cache with a pipes (C++ code) job? I want to be able to access a file on the local disk all over the data nodes, so hadoop would copy it to all data nodes before a map reduce job. Thanks. Hi, First of all you need to copy the files to the dfs. And then add the file to the distributed cache. You can give comma sepearted values of the files or archives to be added to distributed cache, for mapred.cache.files and mapred.cache.archives in the conf file. Ex: property namemapred.cache.files/name value/files/file1,/files/file2.txt/value description The files in distributed cache/description /property property namemapred.cache.archives/name value/archives/arc1.zip,/archives/arc2.jar/value descriptionThe archives in distributed cache/description /property You can also give URIs of the file names. You can give the URI as hdfs://path#link, here mapred will create a symlink with name link in the working directory. Hope this clarifies Thanks Amareshwari
Re: Hbase Matrix Package for Map/Reduce-based Parallel Matrix Computations
planning to have a domain specific language to support it. Did you mean the MATLAB-like scientific language interface for matrix computations? If so, you are welcome any time. I made a matrix page on hadoop wiki. http://wiki.apache.org/hadoop/Matrix Freely add your profile and your plans. Thanks. On 1/28/08, Chanwit Kaewkasi [EMAIL PROTECTED] wrote: Hi Edward, I'm interested in your HBase Matrix package, and planning to have a domain specific language to support it. (I'm going to use Groovy for doing this) But from seeing your codes, I've found that numerical data from HBase needs to be converted from string to be double before computation. Is it possible to have a native floating-point support in HBase for doing numerical computation? -Chanwit On 25/01/2008, edward yoon [EMAIL PROTECTED] wrote: I started develop the hbase matrix package and benchmarks between the Map/Reduce + Hbase environment and the R-ScaLAPACK + ScaLAPACK + LAPACK + BLACS + BLAS + MPI environment. http://wiki.apache.org/hadoop/DataProcessingBenchmarks -- B. Regards, Edward yoon @ NHN, corp. -- Chanwit Kaewkasi PhD Student, Centre for Novel Computing School of Computer Science The University of Manchester Oxford Road Manchester M13 9PL, UK -- B. Regards, Edward yoon @ NHN, corp.