Re: DataNode logs have exceptions - DataXceiver error processing unknown operation

2015-06-25 Thread Rajesh Kartha
Ah...I am using Ambari..so that does explain who is attempting to connect
to the DataNode consistently.

Thank you for the prompt reply, Yusaku !!

Regards,
Rajesh

On Thu, Jun 25, 2015 at 5:09 PM, Yusaku Sako yus...@hortonworks.com wrote:

  Hi Rajesh,

  Are you running Ambari?  If so, this is benign and can be ignored.
 Ambari pings the DataNode by making a socket connection once a minute to
 make sure it's up and running.  Otherwise, it will trigger an alert.
 Unfortunately, there's no known way to ping the DataNode with a valid
 payload to not cause this log (or at least when the devs implemented this
 in Ambari, there didn't seem to be one).

  Yusaku

   From: Rajesh Kartha karth...@gmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Thursday, June 25, 2015 4:57 PM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: DataNode logs have exceptions - DataXceiver error processing
 unknown operation

   Hello,

  I am using a Hadoop 2.7.1 build and noticed a constant flow of exceptions
 every 60 seconds in the DataNode log files:

 2015-06-25 13:02:36,292 ERROR datanode.DataNode
 (DataXceiver.java:run(278)) - bdavm063.svl.ibm.com:50010:DataXceiver
 error processing unknown operation  src: /127.0.0.1:54415 dst: /
 127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 13:03:34,328 ERROR datanode.DataNode
 (DataXceiver.java:run(278)) - bdavm063.svl.ibm.com:50010:DataXceiver
 error processing unknown operation  src: /127.0.0.1:54441 dst: /
 127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 13:05:36,081 ERROR datanode.DataNode
 (DataXceiver.java:run(278)) - bdavm063.svl.ibm.com:50010:DataXceiver
 error processing unknown operation  src: /127.0.0.1:54477 dst: /
 127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)



  I checked:
 -  ulimit to be -  open files  (-n) *32768*
 - dfs.datanode.max.transfer.threads to be *16384*

  While HDFS seem to work without issues, the logs are filled with these.

  Any thoughts/ideas on resolving this is greatly appreciated.

  Regards,
  Rajesh




Re: DataNode logs have exceptions - DataXceiver error processing unknown operation

2015-06-25 Thread Rajesh Kartha
Forgot to ask...
- Is this a new behavior in Ambari 2.1, I don't remember seeing it in
Ambari 1.7
- Does a JIRA already exist for any of the components to track this.

Thanks,
Rajesh


On Thu, Jun 25, 2015 at 5:42 PM, Rajesh Kartha karth...@gmail.com wrote:

 Ah...I am using Ambari..so that does explain who is attempting to connect
 to the DataNode consistently.

 Thank you for the prompt reply, Yusaku !!

 Regards,
 Rajesh

 On Thu, Jun 25, 2015 at 5:09 PM, Yusaku Sako yus...@hortonworks.com
 wrote:

  Hi Rajesh,

  Are you running Ambari?  If so, this is benign and can be ignored.
 Ambari pings the DataNode by making a socket connection once a minute
 to make sure it's up and running.  Otherwise, it will trigger an alert.
 Unfortunately, there's no known way to ping the DataNode with a valid
 payload to not cause this log (or at least when the devs implemented this
 in Ambari, there didn't seem to be one).

  Yusaku

   From: Rajesh Kartha karth...@gmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Thursday, June 25, 2015 4:57 PM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: DataNode logs have exceptions - DataXceiver error processing
 unknown operation

   Hello,

  I am using a Hadoop 2.7.1 build and noticed a constant flow of
 exceptions every 60 seconds in the DataNode log files:

 2015-06-25 13:02:36,292 ERROR datanode.DataNode
 (DataXceiver.java:run(278)) - bdavm063.svl.ibm.com:50010:DataXceiver
 error processing unknown operation  src: /127.0.0.1:54415 dst: /
 127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 13:03:34,328 ERROR datanode.DataNode
 (DataXceiver.java:run(278)) - bdavm063.svl.ibm.com:50010:DataXceiver
 error processing unknown operation  src: /127.0.0.1:54441 dst: /
 127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 13:05:36,081 ERROR datanode.DataNode
 (DataXceiver.java:run(278)) - bdavm063.svl.ibm.com:50010:DataXceiver
 error processing unknown operation  src: /127.0.0.1:54477 dst: /
 127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)



  I checked:
 -  ulimit to be -  open files  (-n) *32768*
 - dfs.datanode.max.transfer.threads to be *16384*

  While HDFS seem to work without issues, the logs are filled with these.

  Any thoughts/ideas on resolving this is greatly appreciated.

  Regards,
  Rajesh





DataNode logs have exceptions - DataXceiver error processing unknown operation

2015-06-25 Thread Rajesh Kartha
Hello,

I am using a Hadoop 2.7.1 build and noticed a constant flow of exceptions
every 60 seconds in the DataNode log files:

2015-06-25 13:02:36,292 ERROR datanode.DataNode (DataXceiver.java:run(278))
- bdavm063.svl.ibm.com:50010:DataXceiver error processing unknown
operation  src: /127.0.0.1:54415 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
2015-06-25 13:03:34,328 ERROR datanode.DataNode (DataXceiver.java:run(278))
- bdavm063.svl.ibm.com:50010:DataXceiver error processing unknown
operation  src: /127.0.0.1:54441 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
2015-06-25 13:05:36,081 ERROR datanode.DataNode (DataXceiver.java:run(278))
- bdavm063.svl.ibm.com:50010:DataXceiver error processing unknown
operation  src: /127.0.0.1:54477 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)



I checked:
-  ulimit to be -  open files  (-n) *32768*
- dfs.datanode.max.transfer.threads to be *16384*

While HDFS seem to work without issues, the logs are filled with these.

Any thoughts/ideas on resolving this is greatly appreciated.

Regards,
Rajesh


Re: can't submit remote job

2015-05-19 Thread Rajesh Kartha
Wondering if you have used the REST API to submit jobs:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application

There are some issues that I have come across, but it does seem to work.


Also the message
java.io.IOException: File
/tmp/hadoop-yarn/staging/xeon/.staging/job_1432045089375_0001/job.split
could only be replicated to 0 nodes instead of minReplication (=1).
There are 1 datanode(s) running and 1 node(s) are excluded in this
operation.

One node excluded from the operation (?) do you know why it is excluded ?


-Rajesh


On Tue, May 19, 2015 at 7:34 AM, xeonmailinglist-gmail 
xeonmailingl...@gmail.com wrote:

  This has been a real struggle to launch a remote MapReduce job. I know
 that there is the Netflix genie to submit the job, but for the purpose of
 this application (small and personal), I want to code it from scratch.

 I am debugging my code to see what is going on during the submission of a
 remote job, and now I have the error [1]. This error happens during the
 submission of the job, more precisely when it is writing in the remote
 HDFS. I have put the Hadoop code [2] where I get the error. The error [1]
 happens in the instruction out.close() of [2].

 The Namenode and the datanodes are working properly. I have 1 Namenode and
 1 datanode. The replication factor is set to 1.
 Despite everything is running ok I get this error. Any hint so that I can
 see what is going on?

 [1]

 2015-05-19 10:21:03,147 WARN 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
 place enough replicas, still in need of 1 to reach 1 (unavailab
 leStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
 storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
 newBlock=true) All required storage
  types are unavailable:  unavailableStorages=[DISK], 
 storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
 creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
 2015-05-19 10:21:03,147 DEBUG org.apache.hadoop.ipc.Server: Served: addBlock 
 queueTime= 1 procesingTime= 1 exception= IOException
 2015-05-19 10:21:03,147 DEBUG 
 org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException 
 as:xeon (auth:SIMPLE) cause:java.io.IOException: File /tmp/hadoop
 -yarn/staging/xeon/.staging/job_1432045089375_0001/job.split could only be 
 replicated to 0 nodes instead of minReplication (=1).  There are 1 
 datanode(s) running and 1 no
 de(s) are excluded in this operation.
 2015-05-19 10:21:03,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 6 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 194.117.18.101:39006 Call#18 Retry#0
 java.io.IOException: File 
 /tmp/hadoop-yarn/staging/xeon/.staging/job_1432045089375_0001/job.split could 
 only be replicated to 0 nodes instead of minReplication (=1).  There are 1 
 datanode(s) running and 1 node(s) are excluded in this operation.
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 2015-05-19 10:21:03,148 DEBUG org.apache.hadoop.ipc.Server: IPC Server 
 handler 6 on 9000: responding to 
 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 194.117.18.101:39006 Call#18 Retry#0

 [2]

  public static T extends InputSplit void createSplitFiles(Path jobSubmitDir,
   Configuration conf, FileSystem fs, T[] splits)
   throws IOException, InterruptedException {
 FSDataOutputStream out = createFile(fs,
 JobSubmissionFiles.getJobSplitFile(jobSubmitDir), conf);
 SplitMetaInfo[] info = writeNewSplits(conf, splits, out);
 out.close();
 
 writeJobSplitMetaInfo(fs,JobSubmissionFiles.getJobSplitMetaFile(jobSubmitDir),
 new 

Re: hadoop.tmp.dir?

2015-05-18 Thread Rajesh Kartha
Hello,

The 3 main settings in hdfs-site.xml are:


   -   *  dfs.name.dir*: directory where namenode stores its metadata,
   default value ${hadoop.tmp.dir}/dfs/name.
   - *dfs.data.dir:* directory where HDFS data blocks are stored,
   default value ${hadoop.tmp.dir}/dfs/data.
   - *dfs.namenode.checkpoint.dir:* directory where secondary namenode
   store its checkpoints, default value is ${hadoop.tmp.dir}/dfs/namesecondary.



By default it uses the ${hadoop.tmp.dir}:
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

but one can provide comma-delimited list of directories paths to point to
multiple locations/disks to have them distributed.

HTH

-Rajesh



On Mon, May 18, 2015 at 2:41 PM, Caesar Samsi caesarsa...@mac.com wrote:

 Hello,



 Hadoop.tmp.dir seems to be the root of all storage directories.



 I’d like for data to be stored in separate locations.



 Is there a list of directories and how they can be specified?



 Thank you, Caesar.



 (.tmp seems to indicate a temporary condition and yet it’s used by HDFS,
 etc.)



Re: Hive question

2015-05-15 Thread Rajesh Kartha
Hi Giri,

Have you tried the COALESCE function:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions

example for a date column type:
hive select COALESCE(dt, cast('2014-02-23' as date)) from simpletest;

In any case, maybe good to post the question in the Hive user/dev group as
well.

-Rajesh


On Fri, May 15, 2015 at 11:31 AM, Giri P gpatc...@gmail.com wrote:

 Hi All,

 Since hive is schema on read when we try to write data that is different
 data type into a column it doesn't throw any error. When we try to read it
 , it actually show NULL if its a different data type.

 Are there any options to throw error if data is of different data type
 when we try to insert or read


 Thanks
 Giri



Re: max number of application master in YARN

2015-04-30 Thread Rajesh Kartha
With Capacity Scheduler, the other useful param would be:
yarn.scheduler.capacity.maximum-applications

http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html



On Thu, Apr 30, 2015 at 11:52 AM, Prashant Kommireddi prash1...@gmail.com
wrote:

 Take a look at

 yarn.scheduler.capacity.maximum-am-resource-percent



 On Thu, Apr 30, 2015 at 11:38 AM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Is there any configuration in MR2 and YARN to limit concurrent max
 applications by setting max limit on ApplicationMasters in the cluster?





Re: ipc.client RetryUpToMaximumCountWithFixedSleep

2015-04-30 Thread Rajesh Kartha
Curious, did you check fs.defaultFS in the core-site.xml ? Just to make
sure the HDFS port is 9000 and not 8020

-Rajesh

On Thu, Apr 30, 2015 at 4:42 AM, Mahmood Naderan nt_mahm...@yahoo.com
wrote:

 I found out that the $JAVA_HOME specified in hadoop-env.sh was different
 from java -version in the command line. So I fix the variable to point to
 JAVA_1.7 (the jar file is also written with 1.7)

 Still I get ipc.client error but this time it sound different. The whole
 output (in verbose mode) is available at http://pastebin.com/A7SzcqBD

 You can see in the bottom that hadoop is up and works properly. It is
 really an annoying message. Do you have any idea about that? I faced that
 problem before but at that time, the report command showed that datanode
 was off.

 This time I see the datanode is up so I really wonder how to overcome this
 annoying error!

 Regards,
 Mahmood










Re: Linux Container Executor (LCE) vs Default Container Executor(DCE)

2015-03-26 Thread Rajesh Kartha
Thank you Harsh !!

Are there any other ways to find the owner of the containers.  I suppose
one way is doing a *ps -ef|grep container* and view the process details.

Regards,
Rajesh

On Thu, Mar 26, 2015 at 11:31 AM, Harsh J ha...@cloudera.com wrote:

  In both cases the container is executed under the user submitting it.

 This is incorrect. The DCE executes as the NodeManager process user
 ('yarn' typically), and the LCE in non-secure mode by default runs only as
 'nobody' (or arbitrary static user) unless asked to run as the actual user
 by switching off the static user config.

 On Thu, Mar 26, 2015 at 8:46 PM, Rajesh Kartha karth...@gmail.com wrote:

 Hello,

 I was wondering what are the main differences between LCE and DCE under '
 *simple*' Hadoop security.

 From my readings LCE gives:
 - granularity to control execution  like ban users, min uid
 - use cgroups to control resources

 While DCE uses ulimits.

 In both cases the container is executed under the user submitting it.

 Any further insights is appreciated.

 Thanks,
 Rajesh




 --
 Harsh J



Linux Container Executor (LCE) vs Default Container Executor(DCE)

2015-03-26 Thread Rajesh Kartha
Hello,

I was wondering what are the main differences between LCE and DCE under '
*simple*' Hadoop security.

From my readings LCE gives:
- granularity to control execution  like ban users, min uid
- use cgroups to control resources

While DCE uses ulimits.

In both cases the container is executed under the user submitting it.

Any further insights is appreciated.

Thanks,
Rajesh


Re: HDFS data after nodes become unavailable?

2015-02-25 Thread Rajesh Kartha
Do you know why the 3 nodes are down ? With replication,  the copy of data
that were hosted on those failed nodes will not be available. However, the
data will still be served by the hosts having the other 2 copies - so I
don't think you need to copy the data again.

Unless for some reason the 3 copies of some data ended up on these nodes,
in which case those will not be available

Maybe you could do a ' hadoop fsck /'  to confirm if the HDFS is healthy.

-Rajesh

On Wed, Feb 25, 2015 at 9:21 AM, tesm...@gmail.com tesm...@gmail.com
wrote:

 Dear all,

 I have transferred the data from local storage to HDFS in my 10 nodes
 Hadoop cluster. The relication facotr is 3.

 Some nodes, say 3,  are not available after some time. I can't use those
 nodes for computation or storage of data.

 What will happen to the data stored on HDFS of those nodes?

 Do I need to remvoe all the data from HDFS and copy it again?

 Regards,




Re: Encryption At Rest Question

2015-02-24 Thread Rajesh Kartha
I was trying out the Transparent data at rest encryption and was able to
setup the KMS, zones etc. and add
files to the zone.

How do I confirm if the files I added to the encryption zone are encrypted
? Is there a way to view
the raw file, a *hdfs fs -cat *shows me the actual contents of the files
since the datanode decrypts it
before sending it.

Thanks,
Rajesh


On Fri, Feb 20, 2015 at 11:42 PM, Ranadip Chatterjee ranadi...@gmail.com
wrote:

 In case of SSL enabled cluster, the DEK will be encrypted on the wire by
 the SSL layer.

 In case of non-SSL enabled cluster, it is not. But the intercepter only
 gets the DEK and not the encrypted data, so the data is still safe. Only if
 the intercepter also manages to gain access to the encrypted data block and
 associate that with the corresponding DEK, then the data is compromised.
 Given that each HDFS file has a different DEK, the intercepter has to gain
 quite a bit of access before the data is compromised.

 On 18 February 2015 at 00:04, Plamen Jeliazkov 
 plamen.jeliaz...@wandisco.com wrote:

 Hey guys,

 I had a question about how the new file encryption work done primarily in
 HDFS-6134.

 I was just curious, how is the DEK protected on the wire?
 Particularly after the KMS decrypts the EDEK and returns it to the client.

 Thanks,
 -Plamen



 5 reasons your Hadoop needs WANdisco
 http://www.wandisco.com/system/files/documentation/5-Reasons.pdf

 Listed on the London Stock Exchange: WAND
 http://www.bloomberg.com/quote/WAND:LN

 THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY
 BE PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its
 subsidiaries, (WANdisco) does not waive any confidentiality or
 privilege.  If you are not the intended recipient, please notify us
 immediately and destroy the message without disclosing its contents to
 anyone.  Any distribution, use or copying of this e-mail or the information
 it contains by other than an intended recipient is unauthorized.  The views
 and opinions expressed in this e-mail message are the author's own and may
 not reflect the views and opinions of WANdisco, unless the author is
 authorized by WANdisco to express such views or opinions on its behalf.
 All email sent to or from this address is subject to electronic storage and
 review by WANdisco.  Although WANdisco operates anti-virus programs, it
 does not accept responsibility for any damage whatsoever caused by viruses
 being passed.




 --
 Regards,
 Ranadip Chatterjee



Re: Encryption At Rest Question

2015-02-24 Thread Rajesh Kartha
Thank you Olivier,

I suppose with the first suggestion - locking the dir to be unreadable for
other users, the HDFS permissions would
kick in and prevent an unwarranted user to read them.
However, I wanted to see the actual encrypted data so I used the second
approach you suggested. With hadoop fsck /mysecureDir -files -blocks
-locations get the blocks for the directory, then go to the data node and
perform a cat to see cryptic data for those block.

Regards,
Rajesh

On Tue, Feb 24, 2015 at 12:28 PM, Olivier Renault orena...@hortonworks.com
wrote:

   You can try looking at it with a user who doesn’t have permission to
 the folder. An alternative is to check which block it is on Linux and
 looking at the block using cat from a linux shell.

  Olivier


   From: Rajesh Kartha karth...@gmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Tuesday, 24 February 2015 19:47
 To: user@hadoop.apache.org user@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org
 Subject: Re: Encryption At Rest Question

 I was trying out the Transparent data at rest encryption and was able
 to setup the KMS, zones etc. and add
  files to the zone.

  How do I confirm if the files I added to the encryption zone are
 encrypted ? Is there a way to view
  the raw file, a *hdfs fs -cat *shows me the actual contents of the files
 since the datanode decrypts it
  before sending it.

  Thanks,
  Rajesh


 On Fri, Feb 20, 2015 at 11:42 PM, Ranadip Chatterjee ranadi...@gmail.com
 wrote:

  In case of SSL enabled cluster, the DEK will be encrypted on the wire
 by the SSL layer.

  In case of non-SSL enabled cluster, it is not. But the intercepter only
 gets the DEK and not the encrypted data, so the data is still safe. Only if
 the intercepter also manages to gain access to the encrypted data block and
 associate that with the corresponding DEK, then the data is compromised.
 Given that each HDFS file has a different DEK, the intercepter has to gain
 quite a bit of access before the data is compromised.

 On 18 February 2015 at 00:04, Plamen Jeliazkov 
 plamen.jeliaz...@wandisco.com wrote:

 Hey guys,

  I had a question about how the new file encryption work done primarily
 in HDFS-6134.

  I was just curious, how is the DEK protected on the wire?
 Particularly after the KMS decrypts the EDEK and returns it to the
 client.

  Thanks,
 -Plamen



  5 reasons your Hadoop needs WANdisco
 http://www.wandisco.com/system/files/documentation/5-Reasons.pdf

 Listed on the London Stock Exchange: WAND
 http://www.bloomberg.com/quote/WAND:LN

 THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY
 BE PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its
 subsidiaries, (WANdisco) does not waive any confidentiality or
 privilege.  If you are not the intended recipient, please notify us
 immediately and destroy the message without disclosing its contents to
 anyone.  Any distribution, use or copying of this e-mail or the information
 it contains by other than an intended recipient is unauthorized.  The views
 and opinions expressed in this e-mail message are the author's own and may
 not reflect the views and opinions of WANdisco, unless the author is
 authorized by WANdisco to express such views or opinions on its behalf.
 All email sent to or from this address is subject to electronic storage and
 review by WANdisco.  Although WANdisco operates anti-virus programs, it
 does not accept responsibility for any damage whatsoever caused by viruses
 being passed.




 --
 Regards,
 Ranadip Chatterjee





Re: How to get Hadoop's Generic Options value

2015-02-20 Thread Rajesh Kartha
Here is an example:
https://adhoop.wordpress.com/2012/02/16/generate-a-list-of-anagrams-round-3/

-Rajesh

On Thu, Feb 19, 2015 at 9:32 PM, Haoming Zhang haoming.zh...@outlook.com
wrote:

 Thanks guys,

 I will try your solutions later and update the result!

 --
 From: unmeshab...@gmail.com
 Date: Fri, 20 Feb 2015 10:04:38 +0530
 Subject: Re: How to get Hadoop's Generic Options value
 To: user@hadoop.apache.org


 Try implementing your program

 public class YourDriver extends Configured implements Tool {

 main()
 run()
 }

 Then supply your file using -D option.

 Thanks
 Unmesha Biju