Re: What should the open file limit be for hbase

2008-01-28 Thread stack

Hey Marc:

You are still seeing 'too many open files'?  Whats your schema look 
like.  I added to http://wiki.apache.org/hadoop/Hbase/FAQ#5 a rough 
formula for counting how many open mapfiles in a running regionserver.


Currently, your only recourse is upping the ulimit.   Addressing this 
scaling barrier will be a focus of next hbase release.


St.Ack



Marc Harris wrote:

I have seen that hbase can cause too many open file errors. I increase
my limit to 10240 (10 times the previous limit) but still get errors.

Is there a recommended value that I should set my open files limit to?
Is there something else I can do to reduce the number of files, perhaps
with some other trade-off?

Thanks
- Marc


  




What should the open file limit be for hbase

2008-01-28 Thread Marc Harris
I have seen that hbase can cause too many open file errors. I increase
my limit to 10240 (10 times the previous limit) but still get errors.

Is there a recommended value that I should set my open files limit to?
Is there something else I can do to reduce the number of files, perhaps
with some other trade-off?

Thanks
- Marc



Re: What should the open file limit be for hbase

2008-01-28 Thread Marc Harris
Sorry, I should have said 4 families in one table, obviously, not 4
regions in one table.
- Marc
 

On Mon, 2008-01-28 at 11:57 -0500, Marc Harris wrote:

 My schema is very simple: 4 regions in one table.
 
 create table pagefetch (
 info MAX_VERSIONS=1,
 data MAX_VERSIONS=1,
 headers MAX_VERSIONS=1,
 redirects MAX_VERSIONS=1
 );
 
 I am running hadoop in distributed configuration, but with only one data
 node.
 I am running hbase with two region servers (one of which is one the same
 machine as hadoop).
 I am seeing the exceptions in my datanode log file, by the way, not my
 regionserver log file
 lsof -p REGIONSERVER_PID | wc -l gave 479
 lsof -p DATANODE_PID | wc -l gave 10287
 
 - Marc
 
 
 
 
 On Mon, 2008-01-28 at 08:13 -0800, stack wrote:
 
  Hey Marc:
  
  You are still seeing 'too many open files'?  Whats your schema look 
  like.  I added to http://wiki.apache.org/hadoop/Hbase/FAQ#5 a rough 
  formula for counting how many open mapfiles in a running regionserver.
  
  Currently, your only recourse is upping the ulimit.   Addressing this 
  scaling barrier will be a focus of next hbase release.
  
  St.Ack
  
  
  
  Marc Harris wrote:
   I have seen that hbase can cause too many open file errors. I increase
   my limit to 10240 (10 times the previous limit) but still get errors.
  
   Is there a recommended value that I should set my open files limit to?
   Is there something else I can do to reduce the number of files, perhaps
   with some other trade-off?
  
   Thanks
   - Marc
  
  
 
  


hadoop 0.15.3 r612257 freezes on reduce task

2008-01-28 Thread Florian Leibert

hi,
i setup a cluster with 3 slaves and 1 master. when running the  
wordcount example, hadoop execution halts after the mapping phase. It  
has started the reduce phase and generally gets to 10-20% but halts  
there. I left it running for 30 minutes, no change. Log files don't  
show any errors...
I reformatted the Hadoop FS that i created, access to the files from  
all nodes seems fine.

Has anyone seen a similar behavior?

Thanks,
Florian




Re: distributed cache

2008-01-28 Thread Amareshwari Sri Ramadasu

jerrro wrote:

Hello,

Is there a way to use Distributed Cache with a pipes (C++ code) job? I want
to be able to access a file on the local disk all over the data nodes, so
hadoop would copy it to all data nodes before a map reduce job.

Thanks.
  


Hi,

First of all you need to copy the files to the dfs. And then add the 
file to the distributed cache.
You can give comma sepearted values of the files or archives to be added 
to distributed cache,

for mapred.cache.files and mapred.cache.archives in the conf file.

Ex:

property
 namemapred.cache.files/name
 value/files/file1,/files/file2.txt/value
 description The files in distributed cache/description
/property

property
 namemapred.cache.archives/name
 value/archives/arc1.zip,/archives/arc2.jar/value
 descriptionThe archives in distributed cache/description
/property

You can also give URIs of the file names.
You can give the URI as hdfs://path#link, here mapred will create a 
symlink with name link in the  working directory.


Hope this clarifies

Thanks
Amareshwari



Re: Hbase Matrix Package for Map/Reduce-based Parallel Matrix Computations

2008-01-28 Thread edward yoon
 planning to have a domain specific language to support it.

Did you mean the MATLAB-like scientific language interface for matrix
computations? If so, you are welcome any time.

I made a matrix page on hadoop wiki.
http://wiki.apache.org/hadoop/Matrix

Freely add your profile and your plans.

Thanks.

On 1/28/08, Chanwit Kaewkasi [EMAIL PROTECTED] wrote:
 Hi Edward,

 I'm interested in your HBase Matrix package, and planning to have a
 domain specific language to support it. (I'm going to use Groovy for
 doing this)

 But from seeing your codes, I've found that numerical data from HBase
 needs to be converted from string to be double before computation.

 Is it possible to have a native floating-point support in HBase for
 doing numerical computation?

 -Chanwit

 On 25/01/2008, edward yoon [EMAIL PROTECTED] wrote:
  I started develop the hbase matrix package and benchmarks between the
  Map/Reduce + Hbase environment and the R-ScaLAPACK + ScaLAPACK +
  LAPACK + BLACS + BLAS + MPI environment.
 
  http://wiki.apache.org/hadoop/DataProcessingBenchmarks
 
  --
  B. Regards,
  Edward yoon @ NHN, corp.
 


 --
 Chanwit Kaewkasi
 PhD Student,
 Centre for Novel Computing
 School of Computer Science
 The University of Manchester
 Oxford Road
 Manchester
 M13 9PL, UK



-- 
B. Regards,
Edward yoon @ NHN, corp.