Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Nitin Pawar
please start a new thread for your questions poonam

if your data is in plain files then you can do bin/hadoop dfs -cat file/path

you can copy the file to local  with bin/hadoop fs -copyToLocal
[-ignorecrc] [-crc] URI 


On Thu, Feb 28, 2013 at 12:57 PM, POONAM GHULI wrote:

>
> How to extract data from HDFS file system to our sytem? are there any
> commands available?
>
>>
>>
> regards
> poonam
>
>


-- 
Nitin Pawar


Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread POONAM GHULI
How to extract data from HDFS file system to our sytem? are there any
commands available?

>
>
regards
poonam


Re: where reduce is copying?

2013-02-27 Thread Patai Sangbutsarakum
Thanks Harsh, you always the first..

Yeah, that's really make sense, copy inbound those output of mappers
to running reduce task attempt.

I am trying to think that the speed of 0.44MB/s is pretty low to me.
i am deciding if it is because of data is not that much to copy as
because not all the mappers are finished at the same time.
or it is the problem of the network itself. (i already check that bond0 is 1gb)

Thanks
Patai


On Wed, Feb 27, 2013 at 11:06 PM, Harsh J  wrote:
> The latter (from other machines, inbound to where the reduce is
> running, onto the reduce's local disk, via mapred.local.dir). The
> reduce will, obviously, copy outputs from all maps that may have
> produced data for its assigned partition ID.
>
> On Thu, Feb 28, 2013 at 12:27 PM, Patai Sangbutsarakum
>  wrote:
>> Good evening Hadoopers!
>>
>> at the jobtracker page, click on a job, and click at running reduce
>> task, I am going to see
>>
>> task_201302271736_0638_r_00 reduce > copy (136 of 261 at 0.44 MB/s)
>>
>> I am really curious where is the data is being copy.
>> if i clicked at the task, it will show a host that is running the task 
>> attempt.
>>
>> question is "reduce > copy" is referring data copy outbound from host
>> that is running task attempt, or
>> referring to data is being copy from other machines inbound to this
>> host (that's running task attempt)
>>
>> and in both cases how do i know what machines that host is copy data from/to?
>>
>> Regards,
>> Patai
>
>
>
> --
> Harsh J


Re: where reduce is copying?

2013-02-27 Thread Harsh J
The latter (from other machines, inbound to where the reduce is
running, onto the reduce's local disk, via mapred.local.dir). The
reduce will, obviously, copy outputs from all maps that may have
produced data for its assigned partition ID.

On Thu, Feb 28, 2013 at 12:27 PM, Patai Sangbutsarakum
 wrote:
> Good evening Hadoopers!
>
> at the jobtracker page, click on a job, and click at running reduce
> task, I am going to see
>
> task_201302271736_0638_r_00 reduce > copy (136 of 261 at 0.44 MB/s)
>
> I am really curious where is the data is being copy.
> if i clicked at the task, it will show a host that is running the task 
> attempt.
>
> question is "reduce > copy" is referring data copy outbound from host
> that is running task attempt, or
> referring to data is being copy from other machines inbound to this
> host (that's running task attempt)
>
> and in both cases how do i know what machines that host is copy data from/to?
>
> Regards,
> Patai



--
Harsh J


where reduce is copying?

2013-02-27 Thread Patai Sangbutsarakum
Good evening Hadoopers!

at the jobtracker page, click on a job, and click at running reduce
task, I am going to see

task_201302271736_0638_r_00 reduce > copy (136 of 261 at 0.44 MB/s)

I am really curious where is the data is being copy.
if i clicked at the task, it will show a host that is running the task attempt.

question is "reduce > copy" is referring data copy outbound from host
that is running task attempt, or
referring to data is being copy from other machines inbound to this
host (that's running task attempt)

and in both cases how do i know what machines that host is copy data from/to?

Regards,
Patai


Re: How to find Replication factor for one perticular folder in HDFS

2013-02-27 Thread Harsh J
Its "hdfs getconf", not "hdfs -getconf". The first sub-command is not
an option arg, generally, when using the hadoop/hdfs/yarn/mapred
scripts.

On Wed, Feb 27, 2013 at 3:40 PM, Dhanasekaran Anbalagan
 wrote:
> HI YouPeng Yang ,
>
> Hi already configured dfs.replication factor=2
>
>>>  1. To get the key from configuration :
> /bin/hdfs -getconf -conKey dfs.replication
>
> hdfs@dvcliftonhera227:~$ hdfs -getconf -conKey dfs.replication
> Unrecognized option: -getconf
> Could not create the Java virtual machine.
>
> Please guide me.
>
> -Dhanasekaran
>
>
> Did I learn something today? If not, I wasted it.
>
>
> On Mon, Feb 25, 2013 at 7:31 PM, YouPeng Yang 
> wrote:
>>
>> Hi Dhanasekaran Anbalagan
>>
>>   1. To get the key from configuration :
>> /bin/hdfs -getconf -conKey dfs.replication
>>
>>
>>2.Maybe you can add the attribute
>>true to your dfs.replication :
>>
>>
>> dfs.replication
>>2
>>true
>> 
>>
>>
>> regards.
>>
>>
>>
>> 2013/2/26 Nitin Pawar 
>>>
>>> see if the link below helps you
>>>
>>>
>>> http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/
>>>
>>>
>>> On Mon, Feb 25, 2013 at 10:36 PM, Dhanasekaran Anbalagan
>>>  wrote:

 Hi Guys,

 How to query particular folder witch replication factor configured. In
 my cluster some folder in HDFS configured 2 and some of them configured as
 three. How to query.

 please guide me

 -Dhanasekaran

 Did I learn something today? If not, I wasted it.
>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>
>>
>



--
Harsh J


Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread samir das mohapatra
Is it good way to take total 5PB data through the JAVA/JDBC Program ?


On Wed, Feb 27, 2013 at 5:56 PM, Michel Segel wrote:

> I wouldn't use sqoop if you are taking everything.
> Simpler to write your own java/jdbc program that writes its output to HDFS.
>
> Just saying...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 27, 2013, at 5:15 AM, samir das mohapatra 
> wrote:
>
> thanks all.
>
>
>
> On Wed, Feb 27, 2013 at 4:41 PM, Jagat Singh  wrote:
>
>> You might want to read this
>>
>>
>> http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
>>
>>
>>
>>
>> On Wed, Feb 27, 2013 at 10:09 PM, samir das mohapatra <
>> samir.help...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>>Using sqoop how to take entire database table into HDFS insted of
>>> Table by Table ?.
>>>
>>> How do you guys did it?
>>> Is there some trick?
>>>
>>> Regards,
>>> samir.
>>>
>>
>>
>


Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Thomas Nguy
Thank you everyone for your answer. It gives me a lot of paths for reflection.

Thanks Larry, I'll have to dig more about the "inter-cloud" system my uni is 
using.

Thomas





 De : Charles Earl 
À : "user@hadoop.apache.org"  
Cc : "user@hadoop.apache.org"  
Envoyé le : Mercredi 27 février 2013 16h09
Objet : Re: Work on research project "Hadoop Security Design"
 

Thomas, 
Interesting distinctions articulated, thanks.
I knew of an effort called GarewayFS that had inter-cluster secure federation 
as one goal, there are some similarities to Knox.
C

On Feb 27, 2013, at 10:04 AM, Larry McCay  wrote:


Hi Thomas -
>
>
>I think that you need articulate the problems that you want to solve for your 
>university environment.
>The subject that you chose indicates "inter-cloud environment" - so depending 
>on the inter-cloud problems that currently exist for your environment there 
>may be interesting work from the Rhino effort or with Knox.
>
>
>It seems that you are leaning toward data protection and encryption as a 
>solution to some problem within your stated problem subject.
>I'd be interested in the usecase that you are addressing with it that is 
>"inter-cloud".
>Another family of issues that would be interesting in the inter-cloud space 
>would be various identity federation issues across clouds.
>
>
>@Charles - by GatewayFS do you mean HttpFS and are you asking whether Knox is 
>related to it?
>If so, Knox is not directly related to HttpFS though it will leverage lessons 
>learned and hopefully the experience of those involved.
>The Knox gateway is more transparent and committed to serving REST APIs to 
>numerous Hadoop services rather than just HDFS.
>The pluggable providers of Knox gateway will also facilitate easier 
>integration with customer's identity infrastructure in on-prem and cloud 
>provider environments.
>
>
>Hope that helps to draw the distinction between Knox and HttpFS.
>
>thanks,
>
>
>--larry
>
>
>On Wed, Feb 27, 2013 at 9:40 AM, Charles Earl  wrote:
>
>Is this in any way related to GatewayFS?
>>I am also curious whether any one knows of plans to incorporate homomorphic 
>>encryption or secure multiparty into the rhino effort.
>>C
>>
>>
>>On Feb 27, 2013, at 9:30 AM, Nitin Pawar wrote:
>>
>>I am not sure if you guys have heard it or not 
>>>
>>>
>>>HortonWorks is in process to incubate a new apache project called Knox for 
>>>hadoop security. 
>>>More on this you can look at 
>>>
>>>
>>>http://hortonworks.com/blog/introducing-knox-hadoop-security/
>>>
>>>
>>>
>>>http://wiki.apache.org/incubator/knox
>>>
>>>
>>>
>>>
>>>On Wed, Feb 27, 2013 at 7:51 PM, Thomas Nguy  wrote:
>>>
>>>Thank you very much Panshul, I'll take a look.


Thomas.




 De : Panshul Whisper 
À : user@hadoop.apache.org; Thomas Nguy  
Envoyé le : Mercredi 27 février 2013 13h53
Objet : Re: Work on research project "Hadoop Security Design"
 

Hello Thomas,


you can look into this project. This is exactly what you are doing, but at 
a larger scale. 
https://github.com/intel-hadoop/project-rhino/



Hope this helps,


Regards,
Panshul



On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy  wrote:

Hello developers !
>
>
>I'm a student at the french university "Ensimag" and currently doing my 
>master research on "Software security". Interested by cloud computing, I 
>chose for subject : "Secure hadoop cluster inter-cloud environment".
>My idea is to develop a framework in order to improve the security of the 
>Hadoop cluster running on the cloud of my uni. I have started by checking 
>the "Hadoop research projects" proposed  on Hadoop Wiki and the following 
>subject fits with mine:
>
>
>"Hadoop Security Design: 
>An end-to-end proposal for how to support authentication and client side 
>data encryption/decryption, so that large data sets can be stored in a 
>public HDFS and only jobs launched by authenticated users can map-reduce 
>or browse the data"
>
>
>I would like to know if there are already some developers on it so we can 
>discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and cloud 
>cumputing so if would be really great if you had some advices or hints for 
>my research.
>
>
>Best regards.Thomas 



-- 

Regards,Ouch Whisper
010101010101


>>>
>>>
>>>
>>>-- 
>>>Nitin Pawar
>>>
>>
>

HDFS Benchmarking tools in Hadoop 2.0.3-alpha

2013-02-27 Thread Dheeren Bebortha
I am unable to locate the TestDFSIO benchmarking jar in the downloaded tar 
volume. Has it been deprecated?
Thanks,
-Dheeren Bebortha


Re: Datanodes shutdown and HBase's regionservers not working

2013-02-27 Thread Davey Yan
Yes, we make sure that inappropriate use of NFS leading to high load
and the lost heartbeat between cluster members.
There was a NFS partition point to one virtual machine for some
purpose, but the virtual machine shutted down frequently.
BTW, the NFS partition was not for the backup of NN metadata, just for
other temporary purpose, and it has been removed now.
The NFS partition(with autofs) for NN metadata backup has no problem.

For more info, google the "NFS high load"...


On Wed, Feb 27, 2013 at 9:58 AM, Jean-Marc Spaggiari
 wrote:
> Hi Davey,
>
> So were you able to find the issue?
>
> JM
>
> 2013/2/25 Davey Yan :
>> Hi Nicolas,
>>
>> I think i found what led to shutdown of all of the datanodes, but i am
>> not completely certain.
>> I will return to this mail list when my cluster returns to be stable.
>>
>> On Mon, Feb 25, 2013 at 8:01 PM, Nicolas Liochon  wrote:
>>> Network error messages are not always friendly, especially if there is a
>>> misconfiguration.
>>> This said,  "connection refused" says that the network connection was made,
>>> but that the remote port was not opened on the remote box. I.e. the process
>>> was dead.
>>> It could be useful to pastebin the whole logs as well...
>>>
>>>
>>> On Mon, Feb 25, 2013 at 12:44 PM, Davey Yan  wrote:

 But... there was no log like "network unreachable".


 On Mon, Feb 25, 2013 at 6:07 PM, Nicolas Liochon 
 wrote:
 > I agree.
 > Then for HDFS, ...
 > The first thing to check is the network I would say.
 >
 >
 >
 >
 > On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan  wrote:
 >>
 >> Thanks for reply, Nicolas.
 >>
 >> My question: What can lead to shutdown of all of the datanodes?
 >> I believe that the regionservers will be OK if the HDFS is OK.
 >>
 >>
 >> On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon 
 >> wrote:
 >> > Ok, what's your question?
 >> > When you say the datanode went down, was it the datanode processes or
 >> > the
 >> > machines, with both the datanodes and the regionservers?
 >> >
 >> > The NameNode pings its datanodes every 3 seconds. However it will
 >> > internally
 >> > mark the datanodes as dead after 10:30 minutes (even if in the gui
 >> > you
 >> > have
 >> > 'no answer for x minutes').
 >> > HBase monitoring is done by ZooKeeper. By default, a regionserver is
 >> > considered as dead after 180s with no answer. Before, well, it's
 >> > considered
 >> > as live.
 >> > When you stop a regionserver, it tries to flush its data to the disk
 >> > (i.e.
 >> > hdfs, i.e. the datanodes). That's why if you have no datanodes, or if
 >> > a
 >> > high
 >> > ratio of your datanodes are dead, it can't shutdown. Connection
 >> > refused
 >> > &
 >> > socket timeouts come from the fact that before the 10:30 minutes hdfs
 >> > does
 >> > not declare the nodes as dead, so hbase tries to use them (and,
 >> > obviously,
 >> > fails). Note that there is now  an intermediate state for hdfs
 >> > datanodes,
 >> > called "stale": an intermediary state where the datanode is used only
 >> > if
 >> > you
 >> > have to (i.e. it's the only datanode with a block replica you need).
 >> > It
 >> > will
 >> > be documented in HBase for the 0.96 release. But if all your
 >> > datanodes
 >> > are
 >> > down it won't change much.
 >> >
 >> > Cheers,
 >> >
 >> > Nicolas
 >> >
 >> >
 >> >
 >> > On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan 
 >> > wrote:
 >> >>
 >> >> Hey guys,
 >> >>
 >> >> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than
 >> >> 1
 >> >> year, and it works fine.
 >> >> But the datanodes got shutdown twice in the last month.
 >> >>
 >> >> When the datanodes got shutdown, all of them became "Dead Nodes" in
 >> >> the NN web admin UI(http://ip:50070/dfshealth.jsp),
 >> >> but regionservers of HBase were still live in the HBase web
 >> >> admin(http://ip:60010/master-status), of course, they were zombies.
 >> >> All of the processes of jvm were still running, including
 >> >> hmaster/namenode/regionserver/datanode.
 >> >>
 >> >> When the datanodes got shutdown, the load (using the "top" command)
 >> >> of
 >> >> slaves became very high, more than 10, higher than normal running.
 >> >> From the "top" command, we saw that the processes of datanode and
 >> >> regionserver were comsuming CPU.
 >> >>
 >> >> We could not stop the HBase or Hadoop cluster through normal
 >> >> commands(stop-*.sh/*-daemon.sh stop *).
 >> >> So we stopped datanodes and regionservers by kill -9 PID, then the
 >> >> load of slaves returned to normal level, and we start the cluster
 >> >> again.
 >> >>
 >> >>
 >> >> Log of NN at the shutdown point(All of the DNs were remo

Re: Encryption in HDFS

2013-02-27 Thread Lance Norskog

Excellent!

On 02/25/2013 10:43 PM, Mathias Herberts wrote:

Encryption without proper key management only addresses the 'stolen
hard drive' problem.

So far I have not found 100% satisfactory solutions to this hard
problem. I've written OSS (Open Secret Server) partly to address this
problem in Pig, i.e. accessing encrypted data without embedding key
info into the job description file. Proper encrypted data handling
implies striict code review though, as in the case of Pig databags are
spillable and you could end up with unencrypted data stored on disk
without intent.

OSS http://github.com/hbs/oss and the Pig specific code:
https://github.com/hbs/oss/blob/master/src/main/java/com/geoxp/oss/pig/PigSecretStore.java

On Tue, Feb 26, 2013 at 6:33 AM, Seonyeong Bak  wrote:

I didn't handle a key distribution problem because I thought that this
problem is more difficult.
I simply hardcode a key into the code.

A challenge related to security are handled in HADOOP-9331, MAPREDUCE-5025,
and so on.




Re: Correct way to unzip locally an archive in Yarn

2013-02-27 Thread Viral Bajaria
I was using 0.23 and was adding files using the -libjars flag (wanted to
upload some jars which were dependencies for my project) but for some
reason I could never find it in the DistributedCache or would always keep
on getting ClassNotFound on the other side. I took the snippet of code
which does that work when you invoke the hadoop jar command and put it in
my class and it all worked fine. I don't know if the problem was with my
code or if it was a bug. Given that -files is so extensively used, I felt
it could be an issue on my side. In the end I started using 1.1 hadoop and
so completely forgot about diving more deeper but I can definitely revive
it and dig in more.

Hopefully you can try using -libjars too and see if that also is facing a
similar issue since they both are command line switches which should have
almost similar behavior.

Thanks,
Viral

On Tue, Feb 19, 2013 at 10:33 AM, Robert Evans  wrote:

> Yes if you can trace this down I would be very interested.  We are running
> 0.23.6 without any issues, but that does not mean that there is not some
> bug in the code that is causing this to happen in your situation.
>
> --Bobby
>
> From: Sebastiano Vigna 
> Reply-To: "user@hadoop.apache.org" 
> Date: Saturday, February 16, 2013 8:39 AM
> To: "user@hadoop.apache.org" 
> Subject: Re: Correct way to unzip locally an archive in Yarn
>
> I will as soon as I can understand what happens on the cluster (no access
> from home). DistributedCache.getLocalCacheFiles() returns in both cases a
> local name for the zip file uploaded with -files, but locally my unzip code
> works, on the cluster it throws a FileNotFoundException.
>
>
> On 16 February 2013 15:22, Arun C Murthy  wrote:
>
>> This could be a bug, mind opening a jira? Thanks.
>>
>> On Feb 16, 2013, at 2:34 AM, Sebastiano Vigna wrote:
>>
>> On 15 February 2013 16:57, Robert Evans  wrote:
>>
>>> Are you trying to run a Map/Reduce job or are you writing a new YARN
>>> application?  If it is a MR job, then it should work mostly the same as
>>> before (on 1.x). If you are writing a new YARN application then there is
>>> a
>>> separate Map in the ContainerLaunchContext that you need to fill in.
>>
>>
>> It's a MapReduce job (0.23.6). After two days of useless trials, I'm
>> uploading the zip with -files and I wrote a stub to unzip it manually. I
>> was positively unable to get the archive unzipped *to a local directory* in
>> any way.
>>
>> Unfortunately it works in local but not on the cluster. I have still to
>> discover why. :(
>>
>> Ciao,
>>
>>
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>


Re: Custom output value for map function

2013-02-27 Thread Sandy Ryza
That's right, the date needs to be written and read in the same order.

On Wed, Feb 27, 2013 at 11:04 AM, Paul van Hoven <
paul.van.ho...@googlemail.com> wrote:

> Great! Thank you.
>
> I guess the order for writing and reading the data this way is
> important. I mean, for
>
> out.writeUTF("blabla")
> out.writeInt(12)
>
> the following would be correct
>
> text = in.readUTF();
> number = in.readInt();
>
> and this would fail:
>
> number = in.readInt();
> text = in.readUTF();
>
> ?
>
> 2013/2/27 Sandy Ryza :
> > Hi Paul,
> >
> > To do this, you need to make your Dog class implement Hadoop's Writable
> > interface, so that it can be serialized to and deserialized from bytes.
> >
> http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/io/Writable.html
> >
> > The methods you implement would look something like this:
> >
> > public void write(DataOutput out) {
> >   out.writeDouble(weight);
> >   out.writeUTF(name);
> >   out.writeLong(date.toTimeInMillis());
> > }
> >
> > public void readFields(DataInput in) {
> >   weight = in.readDouble();
> >   name = in.readUTF();
> >   date = new Date(in.readLong());
> > }
> >
> > hope that helps,
> > Sandy
> >
> > On Wed, Feb 27, 2013 at 10:34 AM, Paul van Hoven
> >  wrote:
> >>
> >> The output value in the map function is in most examples for hadoop
> >> something like this:
> >>
> >> public static class Map extends Mapper >> outputValue>
> >>
> >> Normally outputValue is something like Text or IntWriteable.
> >>
> >> I got a custom class with its own properties like
> >>
> >> public class Dog {
> >>string name;
> >>Date birthday;
> >>double weight;
> >> }
> >>
> >> Now how would I accomplish the following map function:
> >>
> >> public static class Map extends Mapper >> Dog>
> >>
> >> ?
> >
> >
>


Re: Custom output value for map function

2013-02-27 Thread Paul van Hoven
Great! Thank you.

I guess the order for writing and reading the data this way is
important. I mean, for

out.writeUTF("blabla")
out.writeInt(12)

the following would be correct

text = in.readUTF();
number = in.readInt();

and this would fail:

number = in.readInt();
text = in.readUTF();

?

2013/2/27 Sandy Ryza :
> Hi Paul,
>
> To do this, you need to make your Dog class implement Hadoop's Writable
> interface, so that it can be serialized to and deserialized from bytes.
> http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/io/Writable.html
>
> The methods you implement would look something like this:
>
> public void write(DataOutput out) {
>   out.writeDouble(weight);
>   out.writeUTF(name);
>   out.writeLong(date.toTimeInMillis());
> }
>
> public void readFields(DataInput in) {
>   weight = in.readDouble();
>   name = in.readUTF();
>   date = new Date(in.readLong());
> }
>
> hope that helps,
> Sandy
>
> On Wed, Feb 27, 2013 at 10:34 AM, Paul van Hoven
>  wrote:
>>
>> The output value in the map function is in most examples for hadoop
>> something like this:
>>
>> public static class Map extends Mapper> outputValue>
>>
>> Normally outputValue is something like Text or IntWriteable.
>>
>> I got a custom class with its own properties like
>>
>> public class Dog {
>>string name;
>>Date birthday;
>>double weight;
>> }
>>
>> Now how would I accomplish the following map function:
>>
>> public static class Map extends Mapper> Dog>
>>
>> ?
>
>


Large static structures in M/R heap

2013-02-27 Thread Adam Phelps
We have a job that uses a large lookup structure that gets created as a
static class during the map setup phase (and we have the JVM reused so
this only takes place once).  However of late this structure has grown
drastically (due to items beyond our control) and we've seen a
substantial increase in map time due to the lower available memory.

Are there any easy solutions to this sort of problem?  My first thought
was to see if it was possible to have all tasks for a job execute in
parallel within the same JVM, but I'm not seeing any setting that would
allow that.  Beyond that my only ideas are to move that data into an
external one-per-node key-value store like memcached, but I'm worried
the additional overhead of sending a query for each value being mapped
would also kill the job performance.

- Adam


Re: Custom output value for map function

2013-02-27 Thread Sandy Ryza
Hi Paul,

To do this, you need to make your Dog class implement Hadoop's Writable
interface, so that it can be serialized to and deserialized from bytes.
http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/io/Writable.html

The methods you implement would look something like this:

public void write(DataOutput out) {
  out.writeDouble(weight);
  out.writeUTF(name);
  out.writeLong(date.toTimeInMillis());
}

public void readFields(DataInput in) {
  weight = in.readDouble();
  name = in.readUTF();
  date = new Date(in.readLong());
}

hope that helps,
Sandy

On Wed, Feb 27, 2013 at 10:34 AM, Paul van Hoven <
paul.van.ho...@googlemail.com> wrote:

> The output value in the map function is in most examples for hadoop
> something like this:
>
> public static class Map extends Mapper outputValue>
>
> Normally outputValue is something like Text or IntWriteable.
>
> I got a custom class with its own properties like
>
> public class Dog {
>string name;
>Date birthday;
>double weight;
> }
>
> Now how would I accomplish the following map function:
>
> public static class Map extends Mapper Dog>
>
> ?
>


Custom output value for map function

2013-02-27 Thread Paul van Hoven
The output value in the map function is in most examples for hadoop
something like this:

public static class Map extends Mapper

Normally outputValue is something like Text or IntWriteable.

I got a custom class with its own properties like

public class Dog {
   string name;
   Date birthday;
   double weight;
}

Now how would I accomplish the following map function:

public static class Map extends Mapper

?


Re: How to get Under-replicated blocks information [Location]

2013-02-27 Thread Shawn Higgins
When you run FSCK, what options did you run it with? If you're using 
'hadoop fsck /' then you will typically see a lot of dots being output and 
amongst those dots you should see missing/corrupt or under-replicated 
notices. Here's an excerpt containing under replicated blocks, for an 
example:


.
/user/blank/.staging/job_201302211222_0011/libjars/common.jar:  Under 
replicated blk_-856998378388135111_3050830. Target Replicas is 10 but found 
9 replica(s).
...
..
/user/blank/.staging/job_201302211222_0733/job.jar:  Under replicated 
blk_4441280820634984431_3150889. Target Replicas is 10 but found 9 
replica(s).
...
/user/blank/.staging/job_201302211222_0741/job.jar:  Under replicated 
blk_6336347773734645693_3150957. Target Replicas is 10 but found 9 
replica(s).
..
/user/blank/.staging/job_201302211222_0755/job.jar:  Under replicated 
blk_-1209937263563068132_3154083. Target Replicas is 10 but found 9 
replica(s).
.
/user/blank/.staging/job_201302211222_0756/job.jar:  Under replicated 
blk_-365467798112255961_3154084. Target Replicas is 10 but found 9 
replica(s).



[...]

The paths you see in the example are the HDFS locations within the cluster. 
That should be the information you're looking for.

Does this help?

-Shawn

On Wednesday, February 27, 2013 4:17:18 AM UTC-6, Dhanasekaran Anbalagan 
wrote:
>
> Hi Guys,
>
> I am running three machine cluster, with replication factor 2, I got 
> problem in replica i changed to 2.
> after I ran fsck  i got  Under-replicated blocks: 71 (0.0034828386 %)
>
>  Total size: 105829415143 B
>  Total dirs: 9704
>  Total files: 2038873 (Files currently being written: 2)
>  Total blocks (validated): 2038567 (avg. block size 51913 B) (Total open 
> file blocks (not validated): 2)
>  Minimally replicated blocks: 2038567 (100.0 %)
>  Over-replicated blocks: 0 (0.0 %)
>  Under-replicated blocks: 71 (0.0034828386 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 2
>  Average block replication: 1.995
>  Corrupt blocks: 0
>  Missing replicas: 71 (0.0017414198 %)
>  Number of data-nodes: 2
>  Number of racks: 1
> FSCK ended at Wed Feb 27 05:12:31 EST 2013 in 32647 milliseconds
>
>
> Please guide what the block location of the 71. in HDFS file system.
>
> -Dhanasekaran.
> Did I learn something today? If not, I wasted it.
>  


Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Charles Earl
Thomas, 
Interesting distinctions articulated, thanks.
I knew of an effort called GarewayFS that had inter-cluster secure federation 
as one goal, there are some similarities to Knox.
C

On Feb 27, 2013, at 10:04 AM, Larry McCay  wrote:

> Hi Thomas -
> 
> I think that you need articulate the problems that you want to solve for your 
> university environment.
> The subject that you chose indicates "inter-cloud environment" - so depending 
> on the inter-cloud problems that currently exist for your environment there 
> may be interesting work from the Rhino effort or with Knox.
> 
> It seems that you are leaning toward data protection and encryption as a 
> solution to some problem within your stated problem subject.
> I'd be interested in the usecase that you are addressing with it that is 
> "inter-cloud".
> Another family of issues that would be interesting in the inter-cloud space 
> would be various identity federation issues across clouds.
> 
> @Charles - by GatewayFS do you mean HttpFS and are you asking whether Knox is 
> related to it?
> If so, Knox is not directly related to HttpFS though it will leverage lessons 
> learned and hopefully the experience of those involved.
> The Knox gateway is more transparent and committed to serving REST APIs to 
> numerous Hadoop services rather than just HDFS.
> The pluggable providers of Knox gateway will also facilitate easier 
> integration with customer's identity infrastructure in on-prem and cloud 
> provider environments.
> 
> Hope that helps to draw the distinction between Knox and HttpFS.
> 
> thanks,
> 
> --larry
> 
> On Wed, Feb 27, 2013 at 9:40 AM, Charles Earl  wrote:
>> Is this in any way related to GatewayFS?
>> I am also curious whether any one knows of plans to incorporate homomorphic 
>> encryption or secure multiparty into the rhino effort.
>> C
>> 
>> On Feb 27, 2013, at 9:30 AM, Nitin Pawar wrote:
>> 
>>> I am not sure if you guys have heard it or not 
>>> 
>>> HortonWorks is in process to incubate a new apache project called Knox for 
>>> hadoop security. 
>>> More on this you can look at 
>>> 
>>> http://hortonworks.com/blog/introducing-knox-hadoop-security/
>>> 
>>> http://wiki.apache.org/incubator/knox
>>> 
>>> 
>>> On Wed, Feb 27, 2013 at 7:51 PM, Thomas Nguy  wrote:
 Thank you very much Panshul, I'll take a look.
 
 Thomas.
 
 De : Panshul Whisper 
 À : user@hadoop.apache.org; Thomas Nguy  
 Envoyé le : Mercredi 27 février 2013 13h53
 Objet : Re: Work on research project "Hadoop Security Design"
 
 Hello Thomas,
 
 you can look into this project. This is exactly what you are doing, but at 
 a larger scale. 
 https://github.com/intel-hadoop/project-rhino/
 
 Hope this helps,
 
 Regards,
 Panshul
 
 
 On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy  wrote:
 Hello developers !
 
 I'm a student at the french university "Ensimag" and currently doing my 
 master research on "Software security". Interested by cloud computing, I 
 chose for subject : "Secure hadoop cluster inter-cloud environment".
 My idea is to develop a framework in order to improve the security of the 
 Hadoop cluster running on the cloud of my uni. I have started by checking 
 the "Hadoop research projects" proposed  on Hadoop Wiki and the following 
 subject fits with mine:
 
 "Hadoop Security Design: 
An end-to-end proposal for how to support authentication and client 
 side data encryption/decryption, so that large data sets can be stored in 
 a public HDFS and only jobs launched by authenticated users can map-reduce 
 or browse the data"
 
 I would like to know if there are already some developers on it so we can 
 discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and cloud 
 cumputing so if would be really great if you had some advices or hints for 
 my research.
 
 Best regards.
 Thomas 
 
 
 
 -- 
 Regards,
 Ouch Whisper
 010101010101
>>> 
>>> 
>>> 
>>> -- 
>>> Nitin Pawar
> 


Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Larry McCay
Hi Thomas -

I think that you need articulate the problems that you want to solve for
your university environment.
The subject that you chose indicates "inter-cloud environment" - so
depending on the inter-cloud problems that currently exist for your
environment there may be interesting work from the Rhino effort or with
Knox.

It seems that you are leaning toward data protection and encryption as a
solution to some problem within your stated problem subject.
I'd be interested in the usecase that you are addressing with it that is
"inter-cloud".
Another family of issues that would be interesting in the inter-cloud space
would be various identity federation issues across clouds.

@Charles - by GatewayFS do you mean HttpFS and are you asking whether Knox
is related to it?
If so, Knox is not directly related to HttpFS though it will leverage
lessons learned and hopefully the experience of those involved.
The Knox gateway is more transparent and committed to serving REST APIs to
numerous Hadoop services rather than just HDFS.
The pluggable providers of Knox gateway will also facilitate easier
integration with customer's identity infrastructure in on-prem and cloud
provider environments.

Hope that helps to draw the distinction between Knox and HttpFS.

thanks,

--larry

On Wed, Feb 27, 2013 at 9:40 AM, Charles Earl wrote:

> Is this in any way related to GatewayFS?
> I am also curious whether any one knows of plans to incorporate
> homomorphic encryption or secure multiparty into the rhino effort.
> C
>
> On Feb 27, 2013, at 9:30 AM, Nitin Pawar wrote:
>
> I am not sure if you guys have heard it or not
>
> HortonWorks is in process to incubate a new apache project called Knox for
> hadoop security.
> More on this you can look at
>
> http://hortonworks.com/blog/introducing-knox-hadoop-security/
>
> http://wiki.apache.org/incubator/knox
>
>
> On Wed, Feb 27, 2013 at 7:51 PM, Thomas Nguy  wrote:
>
>> Thank you very much Panshul, I'll take a look.
>>
>> Thomas.
>>
>>   --
>> *De :* Panshul Whisper 
>> *À :* user@hadoop.apache.org; Thomas Nguy 
>> *Envoyé le :* Mercredi 27 février 2013 13h53
>> *Objet :* Re: Work on research project "Hadoop Security Design"
>>
>> Hello Thomas,
>>
>> you can look into this project. This is exactly what you are doing, but
>> at a larger scale.
>> https://github.com/intel-hadoop/project-rhino/
>>
>> Hope this helps,
>>
>> Regards,
>> Panshul
>>
>>
>> On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy wrote:
>>
>> Hello developers !
>>
>> I'm a student at the french university "Ensimag" and currently doing my
>> master research on "Software security". Interested by cloud computing, I
>> chose for subject : "Secure hadoop cluster inter-cloud environment".
>> My idea is to develop a framework in order to improve the security of the
>> Hadoop cluster running on the cloud of my uni. I have started by
>> checking the "Hadoop research projects" proposed  on Hadoop Wiki and the
>> following subject fits with mine:
>>
>> "Hadoop Security Design:
>>  An end-to-end proposal for how to support authentication and client
>> side data encryption/decryption, so that large data sets can be stored in a
>> public HDFS and only jobs launched by authenticated users can map-reduce or
>> browse the data"
>>
>> I would like to know if there are already some developers on it so we can
>> discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and
>> cloud cumputing so if would be really great if you had some advices or
>> hints for my research.
>>
>> Best regards.
>> Thomas
>>
>>
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>>
>>
>
>
> --
> Nitin Pawar
>
>
>


Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Charles Earl
Is this in any way related to GatewayFS?
I am also curious whether any one knows of plans to incorporate homomorphic 
encryption or secure multiparty into the rhino effort.
C
On Feb 27, 2013, at 9:30 AM, Nitin Pawar wrote:

> I am not sure if you guys have heard it or not 
> 
> HortonWorks is in process to incubate a new apache project called Knox for 
> hadoop security. 
> More on this you can look at 
> 
> http://hortonworks.com/blog/introducing-knox-hadoop-security/
> 
> http://wiki.apache.org/incubator/knox
> 
> 
> On Wed, Feb 27, 2013 at 7:51 PM, Thomas Nguy  wrote:
> Thank you very much Panshul, I'll take a look.
> 
> Thomas.
> 
> De : Panshul Whisper 
> À : user@hadoop.apache.org; Thomas Nguy  
> Envoyé le : Mercredi 27 février 2013 13h53
> Objet : Re: Work on research project "Hadoop Security Design"
> 
> Hello Thomas,
> 
> you can look into this project. This is exactly what you are doing, but at a 
> larger scale. 
> https://github.com/intel-hadoop/project-rhino/
> 
> Hope this helps,
> 
> Regards,
> Panshul
> 
> 
> On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy  wrote:
> Hello developers !
> 
> I'm a student at the french university "Ensimag" and currently doing my 
> master research on "Software security". Interested by cloud computing, I 
> chose for subject : "Secure hadoop cluster inter-cloud environment".
> My idea is to develop a framework in order to improve the security of the 
> Hadoop cluster running on the cloud of my uni. I have started by checking the 
> "Hadoop research projects" proposed  on Hadoop Wiki and the following subject 
> fits with mine:
> 
> "Hadoop Security Design: 
>   An end-to-end proposal for how to support authentication and client 
> side data encryption/decryption, so that large data sets can be stored in a 
> public HDFS and only jobs launched by authenticated users can map-reduce or 
> browse the data"
> 
> I would like to know if there are already some developers on it so we can 
> discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and cloud 
> cumputing so if would be really great if you had some advices or hints for my 
> research.
> 
> Best regards.
> Thomas 
> 
> 
> 
> -- 
> Regards,
> Ouch Whisper
> 010101010101
> 
> 
> 
> 
> 
> -- 
> Nitin Pawar



Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Nitin Pawar
I am not sure if you guys have heard it or not

HortonWorks is in process to incubate a new apache project called Knox for
hadoop security.
More on this you can look at

http://hortonworks.com/blog/introducing-knox-hadoop-security/

http://wiki.apache.org/incubator/knox


On Wed, Feb 27, 2013 at 7:51 PM, Thomas Nguy  wrote:

> Thank you very much Panshul, I'll take a look.
>
> Thomas.
>
>   --
> *De :* Panshul Whisper 
> *À :* user@hadoop.apache.org; Thomas Nguy 
> *Envoyé le :* Mercredi 27 février 2013 13h53
> *Objet :* Re: Work on research project "Hadoop Security Design"
>
> Hello Thomas,
>
> you can look into this project. This is exactly what you are doing, but at
> a larger scale.
> https://github.com/intel-hadoop/project-rhino/
>
> Hope this helps,
>
> Regards,
> Panshul
>
>
> On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy  wrote:
>
> Hello developers !
>
> I'm a student at the french university "Ensimag" and currently doing my
> master research on "Software security". Interested by cloud computing, I
> chose for subject : "Secure hadoop cluster inter-cloud environment".
> My idea is to develop a framework in order to improve the security of the
> Hadoop cluster running on the cloud of my uni. I have started by checking
> the "Hadoop research projects" proposed  on Hadoop Wiki and the following
> subject fits with mine:
>
> "Hadoop Security Design:
>  An end-to-end proposal for how to support authentication and client side
> data encryption/decryption, so that large data sets can be stored in a
> public HDFS and only jobs launched by authenticated users can map-reduce or
> browse the data"
>
> I would like to know if there are already some developers on it so we can
> discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and
> cloud cumputing so if would be really great if you had some advices or
> hints for my research.
>
> Best regards.
> Thomas
>
>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>
>
>


-- 
Nitin Pawar


Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Thomas Nguy
Thank you very much Panshul, I'll take a look.

Thomas.



 De : Panshul Whisper 
À : user@hadoop.apache.org; Thomas Nguy  
Envoyé le : Mercredi 27 février 2013 13h53
Objet : Re: Work on research project "Hadoop Security Design"
 

Hello Thomas,

you can look into this project. This is exactly what you are doing, but at a 
larger scale. 
https://github.com/intel-hadoop/project-rhino/


Hope this helps,

Regards,
Panshul



On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy  wrote:

Hello developers !
>
>
>I'm a student at the french university "Ensimag" and currently doing my master 
>research on "Software security". Interested by cloud computing, I chose for 
>subject : "Secure hadoop cluster inter-cloud environment".
>My idea is to develop a framework in order to improve the security of the 
>Hadoop cluster running on the cloud of my uni. I have started by checking the 
>"Hadoop research projects" proposed  on Hadoop Wiki and the following subject 
>fits with mine:
>
>
>"Hadoop Security Design: 
>An end-to-end proposal for how to support authentication and client side data 
>encryption/decryption, so that large data sets can be stored in a public HDFS 
>and only jobs launched by authenticated users can map-reduce or browse the 
>data"
>
>
>I would like to know if there are already some developers on it so we can 
>discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and cloud 
>cumputing so if would be really great if you had some advices or hints for my 
>research.
>
>
>Best regards.Thomas 


-- 

Regards,Ouch Whisper
010101010101

Re: Encryption in HDFS

2013-02-27 Thread Michael Segel
You can encrypt the splits separately. 

The issue of key management is actually a layer above this. 

Looks like the research is on the encryption process w a known key. 
The layer above would handle key management which can be done a couple of 
different ways... 

On Feb 26, 2013, at 1:52 PM, java8964 java8964  wrote:

> I am also interested in your research. Can you share some insight about the 
> following questions?
> 
> 1) When you use CompressionCodec, can the encrypted file split? From my 
> understand, there is no encrypt way can make the file decryption individually 
> by block, right?  For example, if I have 1G file, encrypted using AES, how do 
> you or can you decrypt the file block by block, instead of just using one 
> mapper to decrypt the whole file?
> 2) In your CompressionCodec implementation, do you use the DecompressorStream 
> or BlockDecompressorStream? If BlockDecompressorStream, can you share some 
> examples? Right now, I have some problems to use BlockDecompressorStream to 
> do the exactly same thing as you did.
> 3) Do you have any plan to share your code, especially if you did use 
> BlockDecompressorStream and make the encryption file decrypted block by block 
> in the hadoop MapReduce job.
> 
> Thanks
> 
> Yong
> 
> From: render...@gmail.com
> Date: Tue, 26 Feb 2013 14:10:08 +0900
> Subject: Encryption in HDFS
> To: user@hadoop.apache.org
> 
> Hello, I'm a university student.
> 
> I implemented AES and Triple DES with CompressionCodec in java cryptography 
> architecture (JCA)
> The encryption is performed by a client node using Hadoop API.
> Map tasks read blocks from HDFS and these blocks are decrypted by each map 
> tasks.
> I tested my implementation with generic HDFS. 
> My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each 
> machines have quad core processor (i7-2600) and 4GB memory. 
> A test input is 1TB text file which consists of 32 multiple text files (1 
> text file is 32GB)
> 
> I expected that the encryption takes much more time than generic HDFS. 
> The performance does not differ significantly. 
> The decryption step takes about 5-7% more than generic HDFS. 
> The encryption step takes about 20-30% more than generic HDFS because it is 
> implemented by single thread and executed by 1 client node. 
> So the encryption can get more performance. 
> 
> May there be any error in my test?
> 
> I know there are several implementation for encryting files in HDFS. 
> Are these implementations enough to secure HDFS?
> 
> best regards,
> 
> seonpark
> 
> * Sorry for my bad english 



Re: Work on research project "Hadoop Security Design"

2013-02-27 Thread Panshul Whisper
Hello Thomas,

you can look into this project. This is exactly what you are doing, but at
a larger scale.
https://github.com/intel-hadoop/project-rhino/

Hope this helps,

Regards,
Panshul


On Wed, Feb 27, 2013 at 1:49 PM, Thomas Nguy  wrote:

> Hello developers !
>
> I'm a student at the french university "Ensimag" and currently doing my
> master research on "Software security". Interested by cloud computing, I
> chose for subject : "Secure hadoop cluster inter-cloud environment".
> My idea is to develop a framework in order to improve the security of the
> Hadoop cluster running on the cloud of my uni. I have started by checking
> the "Hadoop research projects" proposed  on Hadoop Wiki and the following
> subject fits with mine:
>
> "Hadoop Security Design:
> An end-to-end proposal for how to support authentication and client side
> data encryption/decryption, so that large data sets can be stored in a
> public HDFS and only jobs launched by authenticated users can map-reduce or
> browse the data"
>
> I would like to know if there are already some developers on it so we can
> discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and
> cloud cumputing so if would be really great if you had some advices or
> hints for my research.
>
> Best regards.
> Thomas
>



-- 
Regards,
Ouch Whisper
010101010101


Work on research project "Hadoop Security Design"

2013-02-27 Thread Thomas Nguy
Hello developers !

I'm a student at the french university "Ensimag" and currently doing my master 
research on "Software security". Interested by cloud computing, I chose for 
subject : "Secure hadoop cluster inter-cloud environment".
My idea is to develop a framework in order to improve the security of the 
Hadoop cluster running on the cloud of my uni. I have started by checking the 
"Hadoop research projects" proposed  on Hadoop Wiki and the following subject 
fits with mine:

"Hadoop Security Design: 
An end-to-end proposal for how to support authentication and client side data 
encryption/decryption, so that large data sets can be stored in a public HDFS 
and only jobs launched by authenticated users can map-reduce or browse the data"

I would like to know if there are already some developers on it so we can 
discuss... To be honest, I'm kinda a "beginner" regarding Hadoop and cloud 
cumputing so if would be really great if you had some advices or hints for my 
research.

Best regards.
Thomas 

Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread Michel Segel
I wouldn't use sqoop if you are taking everything.
Simpler to write your own java/jdbc program that writes its output to HDFS.

Just saying...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 27, 2013, at 5:15 AM, samir das mohapatra  
wrote:

> thanks all.
> 
> 
> 
> On Wed, Feb 27, 2013 at 4:41 PM, Jagat Singh  wrote:
>> You might want to read this
>> 
>> http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
>> 
>> 
>> 
>> 
>> On Wed, Feb 27, 2013 at 10:09 PM, samir das mohapatra 
>>  wrote:
>>> Hi All,
>>> 
>>>Using sqoop how to take entire database table into HDFS insted of Table 
>>> by Table ?.
>>> 
>>> How do you guys did it?
>>> Is there some trick?
>>> 
>>> Regards,
>>> samir.
> 


Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread samir das mohapatra
thanks all.



On Wed, Feb 27, 2013 at 4:41 PM, Jagat Singh  wrote:

> You might want to read this
>
>
> http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
>
>
>
>
> On Wed, Feb 27, 2013 at 10:09 PM, samir das mohapatra <
> samir.help...@gmail.com> wrote:
>
>> Hi All,
>>
>>Using sqoop how to take entire database table into HDFS insted of
>> Table by Table ?.
>>
>> How do you guys did it?
>> Is there some trick?
>>
>> Regards,
>> samir.
>>
>
>


Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread Jagat Singh
You might want to read this

http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal



On Wed, Feb 27, 2013 at 10:09 PM, samir das mohapatra <
samir.help...@gmail.com> wrote:

> Hi All,
>
>Using sqoop how to take entire database table into HDFS insted of Table
> by Table ?.
>
> How do you guys did it?
> Is there some trick?
>
> Regards,
> samir.
>


Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread Kai Voigt
http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
 is your friend

Kai

Am 27.02.2013 um 12:09 schrieb samir das mohapatra :

> Hi All,
> 
>Using sqoop how to take entire database table into HDFS insted of Table by 
> Table ?.
> 
> How do you guys did it?
> Is there some trick?
> 
> Regards,
> samir.

-- 
Kai Voigt
k...@123.org






How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread samir das mohapatra
Hi All,

   Using sqoop how to take entire database table into HDFS insted of Table
by Table ?.

How do you guys did it?
Is there some trick?

Regards,
samir.


RE: How to get Under-replicated blocks information [Location]

2013-02-27 Thread Brahma Reddy Battula
you get the block locations(hoping that you are asking about which node) by two 
ways..

i) ./hadoop fsck / -files -blocks -locations.(Commandline)
ii) NameNode UI..(GO to UI and click on browse filesystem and then click on 
files where you want to check..(Below you can see block locations..))




From: Dhanasekaran Anbalagan [bugcy...@gmail.com]
Sent: Wednesday, February 27, 2013 6:17 PM
To: cdh-user; user
Subject: How to get Under-replicated blocks information [Location]

Hi Guys,

I am running three machine cluster, with replication factor 2, I got problem in 
replica i changed to 2.
after I ran fsck  i got  Under-replicated blocks: 71 (0.0034828386 %)

 Total size: 105829415143 B
 Total dirs: 9704
 Total files: 2038873 (Files currently being written: 2)
 Total blocks (validated): 2038567 (avg. block size 51913 B) (Total open file 
blocks (not validated): 2)
 Minimally replicated blocks: 2038567 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 71 (0.0034828386 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 2
 Average block replication: 1.995
 Corrupt blocks: 0
 Missing replicas: 71 (0.0017414198 %)
 Number of data-nodes: 2
 Number of racks: 1
FSCK ended at Wed Feb 27 05:12:31 EST 2013 in 32647 milliseconds


Please guide what the block location of the 71. in HDFS file system.

-Dhanasekaran.
Did I learn something today? If not, I wasted it.


Re: QJM HA and ClusterID

2013-02-27 Thread Azuryy Yu
Patch available now. anybody can take a look, Thanks.


On Wed, Feb 27, 2013 at 10:46 AM, Azuryy Yu  wrote:

> Hi Suresh,
>
> Thanks for your reply. I filed a bug:
> https://issues.apache.org/jira/browse/HDFS-4533
>
>
>
> On Wed, Feb 27, 2013 at 9:30 AM, Suresh Srinivas 
> wrote:
>
>> looks start-dfs.sh has a bug. It only takes -upgrade option and ignores
>> clusterId.
>>
>> Consider running the command (which is what start-dfs.sh calls):
>> bin/hdfs start namenode -upgrade -clusterId 
>>
>> Please file a bug, if you can, for start-dfs.sh bug which ignores
>> additional parameters.
>>
>>
>> On Tue, Feb 26, 2013 at 4:50 PM, Azuryy Yu  wrote:
>>
>>> Anybody here? Thanks!
>>>
>>>
>>> On Tue, Feb 26, 2013 at 9:57 AM, Azuryy Yu  wrote:
>>>
 Hi all,
 I've stay on this question several days. I want upgrade my cluster from
 hadoop-1.0.3 to hadoop-2.0.3-alpha, I've configured QJM successfully.

 How to customize clusterID by myself. It generated a random clusterID
 now.

 It doesn't work when I run:

 start-dfs.sh -upgrade -clusterId 12345-test

 Thanks!


>>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>
>