HBase Table Pool

2011-09-13 Thread jagaran das
Hi,

Has anybody used HBase Table Pool to connect and load data into Hbase Table??

Regards,
JD

Re: Hadoop doesnt use Replication Level of Namenode

2011-09-13 Thread Steve Loughran

On 13/09/11 05:02, Harsh J wrote:

Ralf,

There is no current way to 'fetch' a config at the moment. You have
the NameNode's config available at NNHOST:WEBPORT/conf page which you
can perhaps save as a resource (dynamically) and load into your
Configuration instance, but apart from this hack the only other ways
are the ones Bharath mentioned. This might lead to slow start ups of
your clients, but would give you the result you want.


I've done it a modified version of Hadoop, all it takes is a servlet in 
the NN. It even served up the live data of the addresses and ports a NN 
was running on, even if it didn't know in advance.




Re: hadoop+lucene

2011-09-13 Thread Linden Hillenbrand
Follow Harsh's suggestion for the sourcecode on hadoop/contrib/index. But
also if you want a distributed Lucene index you can take a look at Katta.

http://katta.sourceforge.net/

On Mon, Sep 12, 2011 at 8:42 PM, 27g  wrote:

> I wanna use hadoop/contrib/index to create a distrabute lucene index on
> hadoop ,who can help me by giving me the sourcecode of the
> hadoop/contrib/index(hadoop 0.20.2),Thank you very much !
> (PS:My English is very poor ,sorry)
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/hadoop-lucene-tp3331449p3331449.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>



-- 
Linden Hillenbrand
Customer Operations Engineer

Phone:  650.644.3900 x4946
Email:   lin...@cloudera.com
Twitter: @lhillenbrand
Data:http://www.cloudera.com


Re: Hadoop doesnt use Replication Level of Namenode

2011-09-13 Thread Edward Capriolo
On Tue, Sep 13, 2011 at 5:53 AM, Steve Loughran  wrote:

> On 13/09/11 05:02, Harsh J wrote:
>
>> Ralf,
>>
>> There is no current way to 'fetch' a config at the moment. You have
>> the NameNode's config available at NNHOST:WEBPORT/conf page which you
>> can perhaps save as a resource (dynamically) and load into your
>> Configuration instance, but apart from this hack the only other ways
>> are the ones Bharath mentioned. This might lead to slow start ups of
>> your clients, but would give you the result you want.
>>
>
> I've done it a modified version of Hadoop, all it takes is a servlet in the
> NN. It even served up the live data of the addresses and ports a NN was
> running on, even if it didn't know in advance.
>
>
Another technique is that if you are using a single replication factor on
all files you can mark the property as true in the
configuration of the NameNode and DataNode. This will always override the
client settings. However in general it is best to manage client
configurations as carefully as you manage the server ones, and ensure that
you give clients the configuration they MUST use puppet/cfengine etc.
Essentially do not count on a client to get them right because the risk is
too high if they are set wrong. IE your situation. "I thought everything was
replicated 3 times"


Re: error trying to setPermissions on hdfs file using api

2011-09-13 Thread Sateesh Lakkarsu
It as failing when I tried it on namenode/datanode too, so it was not the
version issue.

Could be unrelated, but came across
https://issues.apache.org/jira/browse/HADOOP-7629 ... and tried without
FsPermission.createImmutable but rather new FsPermission(511) and that
works.


Outputformat and RecordWriter in Hadoop Pipes

2011-09-13 Thread Vivek K
Hi all,

I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
have been able to successfully work with my own mappers and reducers, but
now I need to generate output (from reducer) in a format different from the
default TextOutputFormat. I have a few questions:

(1) Similar to Hadoop streaming, is there an option to set OutputFormat in
HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
I am using Hadoop version 0.20.2.

(2) For a simple test on how to use an in-built non-default writer, I tried
the following:

 hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=false -input input.seq -output output
-inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
org.apache.hadoop.io.SequenceFile.Writer -program my_test_program

 However this fails with a ClassNotFound exception. And if I remove the
-writer flag and use the default writer, it works just fine.

(3) Is there some example or discussion related to how to write your own
RecordWriter and run it with Hadoop-pipes ?

Thanks.

Best,
Vivek
--


Re: Hadoop on vCloud Express

2011-09-13 Thread SSimko
Jignesh:

I noticed your post and wanted to reach out an offer some assistance.  I
work for a vCloud Express provider (Virtacore) we do have an 
http://kb.virtacore.com/questions/52/Does+the+Virtacore+vCloud+Express+have+an+API%3F
API, some information is available in our Knowledge Base .  If that is
insufficient and we can be of any assistance please reach out, we pride
ourselves on offering a little TLC when necessary.

I should mention that none of our engineers are familiar with Hadoop but we
will help where possible.

Scott

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hadoop-on-vCloud-Express-tp3321247p3332949.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Hadoop/CDH + Avro

2011-09-13 Thread GOEKE, MATTHEW (AG/1000)
Would anyone happen to be able to share a good reference for Avro integration 
with Hadoop? I can find plenty of material around using Avro by itself but I 
have found little to no documentation on how to implement it as both the 
protocol and as custom key/value types.

Thanks,
Matt
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.


RE: Is Hadoop the right platform for my HPC application?

2011-09-13 Thread Parker Jones

Thank you for the explanations, Bobby.  That helps significantly.

I also read the article below which gave me a better understanding of the 
relative merits of MapReduce/Hadoop vs MPI.  Alberto, you might find it useful 
too.
http://grids.ucs.indiana.edu/ptliupages/publications/CloudsandMR.pdf

There is even a MapReduce API built on top of MPI developed at Sandia.

So many options to choose from :-)

Cheers,
Parker

> From: ev...@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Mon, 12 Sep 2011 14:02:44 -0700
> Subject: Re: Is Hadoop the right platform for my HPC application?
> 
> Parker,
> 
> The hadoop command itself is just a shell script that sets up your classpath 
> and some environment variables for a JVM.  Hadoop provides a java API and you 
> should be able to use to write you application, without dealing with the 
> command line.  That being said there is no Map/Reduce C/C++ API.  There is 
> libhdfs.so that will allow you to read/write HDFS files from a C/C++ program, 
> but it actually launches a JVM behind the scenes to handle the actual 
> requests.
> 
> As for a way to avoid writing your input data into files, the data has to be 
> distributed to the compute nodes some how.  You could write a custom input 
> format that does not use any input files, and then have it load the data a 
> different way.  I believe that some people do this to load data from MySQL or 
> some other DB for processing.  Similarly you could do something with the 
> output format to put the data someplace else.
> 
> It is hard to say if Hadoop is the right platform without more information 
> about what you are doing.  Hadoop has been used for lots of embarrassingly 
> parallel problems.  The processing is easy, the real question is where is 
> your data coming from, and where are the results going.  Map/Reduce is fast 
> in part because it tries to reduce data movement and move the computation to 
> the data, not the other way round.  Without knowing the expected size of your 
> data or the amount of processing that it will do, it is hard to say.
> 
> --Bobby Evans
> 
> On 9/12/11 5:09 AM, "Parker Jones"  wrote:
> 
> 
> 
> Hello all,
> 
> I have Hadoop up and running and an embarrassingly parallel problem but can't 
> figure out how to arrange the problem.  My apologies in advance if this is 
> obvious and I'm not getting it.
> 
> My HPC application isn't a batch program, but runs in a continuous loop (like 
> a server) *outside* of the Hadoop machines, and it should occasionally farm 
> out a large computation to Hadoop and use the results.  However, all the 
> examples I have come across interact with Hadoop via files and the command 
> line.  (Perhaps I am looking at the wrong places?)
> 
> So,
> * is Hadoop the right platform for this kind of problem?
> * is it possible to use Hadoop without going through the command line and 
> writing all input data to files?
> 
> If so, could someone point me to some examples and documentation.  I am 
> coding in C/C++ in case that is relevant, but examples in any language should 
> be helpful.
> 
> Thanks for any suggestions,
> Parker
> 
> 
> 
  

Re: Can't print an ArrayWritable as a key even if it implements the ArrayWritable interface

2011-09-13 Thread W.P. McNeill
Now I'm not able to reproduce this bug. The only thing I've changed recently
is moving ItemSet down to a lower package (from wpmcn to wpmcn.structure),
but I'd be surprised if that was the problem.

I'll have to see if I can repro with an older version of the code.

I've implemented ItemSet differently than you've implemented your custom
writable. See https://gist.github.com/1214627. I don't think this is the
issue, though.


Re: Is Hadoop the right platform for my HPC application?

2011-09-13 Thread Robert Evans
Another option to think about is that there is a Hamster project ( 
MAPREDUCE-2911  ) that 
will allow OpenMPI to run on a Hadoop Cluster.  It is still very preliminary 
and will probably not be ready until Hadoop 0.23 or 0.24.

There are other processing methodologies being developed to run on top of YARN 
(Which is the resource scheduler put in as part of Hadoop 0.23) 
http://wiki.apache.org/hadoop/PoweredByYarn

So there are even more choices coming depending on your problem.

--Bobby Evans

On 9/13/11 12:54 PM, "Parker Jones"  wrote:



Thank you for the explanations, Bobby.  That helps significantly.

I also read the article below which gave me a better understanding of the 
relative merits of MapReduce/Hadoop vs MPI.  Alberto, you might find it useful 
too.
http://grids.ucs.indiana.edu/ptliupages/publications/CloudsandMR.pdf

There is even a MapReduce API built on top of MPI developed at Sandia.

So many options to choose from :-)

Cheers,
Parker

> From: ev...@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Mon, 12 Sep 2011 14:02:44 -0700
> Subject: Re: Is Hadoop the right platform for my HPC application?
>
> Parker,
>
> The hadoop command itself is just a shell script that sets up your classpath 
> and some environment variables for a JVM.  Hadoop provides a java API and you 
> should be able to use to write you application, without dealing with the 
> command line.  That being said there is no Map/Reduce C/C++ API.  There is 
> libhdfs.so that will allow you to read/write HDFS files from a C/C++ program, 
> but it actually launches a JVM behind the scenes to handle the actual 
> requests.
>
> As for a way to avoid writing your input data into files, the data has to be 
> distributed to the compute nodes some how.  You could write a custom input 
> format that does not use any input files, and then have it load the data a 
> different way.  I believe that some people do this to load data from MySQL or 
> some other DB for processing.  Similarly you could do something with the 
> output format to put the data someplace else.
>
> It is hard to say if Hadoop is the right platform without more information 
> about what you are doing.  Hadoop has been used for lots of embarrassingly 
> parallel problems.  The processing is easy, the real question is where is 
> your data coming from, and where are the results going.  Map/Reduce is fast 
> in part because it tries to reduce data movement and move the computation to 
> the data, not the other way round.  Without knowing the expected size of your 
> data or the amount of processing that it will do, it is hard to say.
>
> --Bobby Evans
>
> On 9/12/11 5:09 AM, "Parker Jones"  wrote:
>
>
>
> Hello all,
>
> I have Hadoop up and running and an embarrassingly parallel problem but can't 
> figure out how to arrange the problem.  My apologies in advance if this is 
> obvious and I'm not getting it.
>
> My HPC application isn't a batch program, but runs in a continuous loop (like 
> a server) *outside* of the Hadoop machines, and it should occasionally farm 
> out a large computation to Hadoop and use the results.  However, all the 
> examples I have come across interact with Hadoop via files and the command 
> line.  (Perhaps I am looking at the wrong places?)
>
> So,
> * is Hadoop the right platform for this kind of problem?
> * is it possible to use Hadoop without going through the command line and 
> writing all input data to files?
>
> If so, could someone point me to some examples and documentation.  I am 
> coding in C/C++ in case that is relevant, but examples in any language should 
> be helpful.
>
> Thanks for any suggestions,
> Parker
>
>
>




Re: Hadoop doesnt use Replication Level of Namenode

2011-09-13 Thread Joey Echeverria
That won't work with the replication level as that is entirely a
client side config. You can partially control it by setting the
maximum replication level.

-Joey

On Tue, Sep 13, 2011 at 10:56 AM, Edward Capriolo  wrote:
> On Tue, Sep 13, 2011 at 5:53 AM, Steve Loughran  wrote:
>
>> On 13/09/11 05:02, Harsh J wrote:
>>
>>> Ralf,
>>>
>>> There is no current way to 'fetch' a config at the moment. You have
>>> the NameNode's config available at NNHOST:WEBPORT/conf page which you
>>> can perhaps save as a resource (dynamically) and load into your
>>> Configuration instance, but apart from this hack the only other ways
>>> are the ones Bharath mentioned. This might lead to slow start ups of
>>> your clients, but would give you the result you want.
>>>
>>
>> I've done it a modified version of Hadoop, all it takes is a servlet in the
>> NN. It even served up the live data of the addresses and ports a NN was
>> running on, even if it didn't know in advance.
>>
>>
> Another technique is that if you are using a single replication factor on
> all files you can mark the property as true in the
> configuration of the NameNode and DataNode. This will always override the
> client settings. However in general it is best to manage client
> configurations as carefully as you manage the server ones, and ensure that
> you give clients the configuration they MUST use puppet/cfengine etc.
> Essentially do not count on a client to get them right because the risk is
> too high if they are set wrong. IE your situation. "I thought everything was
> replicated 3 times"
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: hadoop+lucene

2011-09-13 Thread 27g
Thank you very much!
Is the source code can search one word in a file? Or search a file in a
directory?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/hadoop-lucene-tp3331449p3334637.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.