Re: Please unsubscribe me from this list

2020-09-16 Thread Arpit Agarwal
Email user-unsubscr...@hadoop.apache.org 
 to unsubscribe



> On Sep 16, 2020, at 9:49 AM, Amir Raza  wrote:
> 
> I don't want to receive these emails
> 
> Thanks



Re: Create a block - file map

2019-12-31 Thread Arpit Agarwal
That is the only way to do it using the client API.

Just curious why you need the mapping.


On Tue, Dec 31, 2019, 00:41 Davide Vergari  wrote:

> Hi all,
> I need to create a block map for all files in a specific directory (and
> subdir) in HDFS.
>
> I'm using fs.listFiles API then I loop in the
> RemoteIterator[LocatedFileStatus] returned by listFiles and for each
> LocatedFileStatus I use the getFileBlockLocations api to get all the block
> ids of that file, but it takes long time because I have millions of file in
> the HDFS directory.
> I also tried to use Spark to parallelize the execution, but HDFS' API are
> not serializable.
>
> Is there a better way? I know there is the "hdfs oiv" command but I can't
> access directly the Namenode directory, also the ImageFS file could be
> outdated and I can't force the safemode to execute the saveNamespace
> command.
>
> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)
>
> Thank you
>


Re: HDFS du Utility Inconsistencies?

2019-11-08 Thread Arpit Agarwal
Got any snapshots?

On Fri, Nov 8, 2019, 09:38 David M  wrote:

> All,
>
>
>
> I’m working on a cluster that is running Hadoop 2.7.3. I have one folder
> in particular where the command hdfs dfs -du is giving me strange results.
> If I query the folder and ask for a summary, it tells me 10 GB. If I don’t
> ask for a summary, all of the folders underneath don’t even add up to 1 GB,
> much less 10 GB.
>
>
>
> I’ve verified this is true over time and is true using the hdfs user or
> any other user. We are on an HDP cluster, so we are using Ranger for HDFS
> security, and Kerberos for authentication. We see similar results in
> -count, where the size and counts are both different. We have not seen this
> behavior in any other folders.
>
>
>
> See below for a sample output we are seeing. I’ve replaced the full path
> with a fake path to protect the data we have on the cluster. Does anyone
> know anything that would cause this behavior? Thanks!
>
>
>
> $ hdfs dfs -du -h /randomFolder
>
> 119.9 M  /randomFolder/bug
>
> 1.0 M/randomFolder/commitment
>
> 86.8 K   /randomFolder/customfield
>
> 31.3 M   /randomFolder/epic
>
> 10.3 M   /randomFolder/feature
>
> 4.0 M/randomFolder/insprintbug
>
> 372.9 K  /randomFolder/project
>
> 15.1 K   /randomFolder/projectstatus
>
> 330.9 M  /randomFolder/story
>
> 256.3 M  /randomFolder/subtask
>
> 74.7 K   /randomFolder/subtemplate
>
> 89.6 M   /randomFolder/task
>
> 7.4 M/randomFolder/techdebt
>
> 117.7 K  /randomFolder/template
>
> 617.9 K  /randomFolder/tempomember
>
> 8.2 K/randomFolder/tempoteam
>
> 1.4 M/randomFolder/tempoworklog
>
>
>
> $ hdfs dfs -du -h -s /randomFolder
>
> 10.6 G  /randomFolder
>
>
>
> David McGinnis
>
>
>


Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Arpit Agarwal
Correct. The blocks will be read sequentially.


> On Jun 26, 2019, at 10:51 AM, Daegyu Han  wrote:
> 
> Thank you for your response.
> 
> Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data 
> node, 
> does the map task read these blocks sequentially when trying to read local 
> blocks?
> 
> 
> 2019년 6월 27일 (목) 02:45, Arpit Agarwal  <mailto:aagar...@cloudera.com>>님이 작성:
> HDFS reads blocks sequentially. We can implement a multi-threaded block 
> reader in theory.
> 
> 
> > On Jun 26, 2019, at 5:05 AM, Daegyu Han  > <mailto:hdg9...@gmail.com>> wrote:
> > 
> > Hi all,
> > 
> > Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
> > 
> > Can the user read multithreaded when reading the input.dat file?
> > 
> > In other words, is not the block being read sequentially, but reading
> > multiple blocks at the same time?
> > 
> > If not, is it difficult to implement a multi-threaded block read?
> > 
> > Best Regards,
> > Daegyu
> > 
> > -
> > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org 
> > <mailto:user-unsubscr...@hadoop.apache.org>
> > For additional commands, e-mail: user-h...@hadoop.apache.org 
> > <mailto:user-h...@hadoop.apache.org>
> > 
> 



Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Arpit Agarwal
HDFS reads blocks sequentially. We can implement a multi-threaded block reader 
in theory.


> On Jun 26, 2019, at 5:05 AM, Daegyu Han  wrote:
> 
> Hi all,
> 
> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
> 
> Can the user read multithreaded when reading the input.dat file?
> 
> In other words, is not the block being read sequentially, but reading
> multiple blocks at the same time?
> 
> If not, is it difficult to implement a multi-threaded block read?
> 
> Best Regards,
> Daegyu
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Issue formatting Namenode in HA cluster using Kerberos

2019-05-02 Thread Arpit Agarwal
You can use /etc/hosts entries as a workaround.

If this is a PoC/test environment, a less secure workaround is host-less 
principals. i.e. omit the _HOST pattern. Not usually recommended since service 
instances will use the same principal and it may be easier to impersonate a 
service if the keytab is compromised.


> On Apr 30, 2019, at 1:11 PM, Adam Jorgensen  
> wrote:
> 
> Ahreverse DNS you sayoh dear
> 
> As per https://github.com/docker/for-linux/issues/365 
> <https://github.com/docker/for-linux/issues/365> it seems that Reverse DNS 
> has been rather broken in Docker since early 2018 :-(
> 
> I'm going to have to do some digging to see if I can find some way to fix 
> this I guess, since the other option is a painful dance involving registering 
> Kerberos principals for the specific IPs
> 
> On Tue, Apr 30, 2019 at 9:56 PM Arpit Agarwal  <mailto:aagar...@cloudera.com>> wrote:
> Likely your reverse DNS is not configured properly. You can check it by 
> running ‘dig -x 10.0.0.238’.
> 
> 
>> On Apr 30, 2019, at 12:24 PM, Adam Jorgensen > <mailto:adam.jorgen...@spandigital.com>> wrote:
>> 
>> Hi all, my first post here. I'm looking for some help with an issue I'm 
>> having attempting to format my Namenode. I'm running a HA configuration and 
>> have configured Kerberos for authentication. 
>> Additionally, I am running Hadoop using Docker Swarm.
>> 
>> The issue I'm having is that when I attempt to format the Namenode the 
>> operation fails with complaints that QJM Journalnode do not have a valid 
>> Kerberos principal. However, the issue is more specific in that it seems 
>> like the operation to format the Journalnodes attempts to use a Kerberos 
>> principal of the form SERVICE/IP@REALM whereas the principals I have 
>> configured use the hostname rather than the IP.
>> 
>> If you take a look at the logging output captured below you will get a 
>> better idea of what the issue is.
>> 
>> Has anyone run into this before? Is there a way I can tell the Namenode 
>> format to use the correct principals?
>> 
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| 2019-04-30 11:17:54,107 
>> INFO namenode.NameNode: STARTUP_MSG:  
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| 
>> / 
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG: Starting 
>> NameNode 
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   host = 
>> hdfs-namenode1/10.0.0.60 <http://10.0.0.60/> 
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   args = 
>> [-format, -nonInteractive] 
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   version = 
>> 3.1.2 
>> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   classpath = 
>> /opt/hadoop-latest/etc/hadoop:/opt/hadoop-latest/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerby-config-1.0.1.jar:/opt/hadoop-latest/sha
>> re/hadoop/common/lib/metrics-core-3.2.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-util-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-lang3-3.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/opt/hadoop-lates
>> t/share/hadoop/common/lib/jetty-util-9.3.24.v20180605.jar:/opt/hadoop-latest/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/had
>> oop-latest/share/hadoop/common/lib/commons-compress-1.18.jar:/opt/hadoop-latest/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerby-util-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop-l
>> atest/share/hadoop/common/lib/accessors-smart-1.2.jar:/opt/hadoop-latest/share/hadoop/common/lib/httpcore-4.4.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:
>> /opt/hadoop-latest/share/hadoop/common/lib/jetty-server-9.3.24.v20180605.jar:/opt/hadoop-latest/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-latest/share/hadoop/common/lib/jersey-servlet-1.19.jar:/opt/hadoop-latest/share/hadoop/common/lib/jackson-databi
>> nd-2.7.8.jar:/opt/hadoop-latest/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/opt/hadoop-latest/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/opt/hadoop-latest/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop-latest/share/hadoop/common/lib/js
>> on-smart-2.3.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-bean

Re: Issue formatting Namenode in HA cluster using Kerberos

2019-04-30 Thread Arpit Agarwal
Likely your reverse DNS is not configured properly. You can check it by running 
‘dig -x 10.0.0.238’.


> On Apr 30, 2019, at 12:24 PM, Adam Jorgensen  
> wrote:
> 
> Hi all, my first post here. I'm looking for some help with an issue I'm 
> having attempting to format my Namenode. I'm running a HA configuration and 
> have configured Kerberos for authentication. 
> Additionally, I am running Hadoop using Docker Swarm.
> 
> The issue I'm having is that when I attempt to format the Namenode the 
> operation fails with complaints that QJM Journalnode do not have a valid 
> Kerberos principal. However, the issue is more specific in that it seems like 
> the operation to format the Journalnodes attempts to use a Kerberos principal 
> of the form SERVICE/IP@REALM whereas the principals I have configured use the 
> hostname rather than the IP.
> 
> If you take a look at the logging output captured below you will get a better 
> idea of what the issue is.
> 
> Has anyone run into this before? Is there a way I can tell the Namenode 
> format to use the correct principals?
> 
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| 2019-04-30 11:17:54,107 
> INFO namenode.NameNode: STARTUP_MSG:  
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| 
> / 
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG: Starting 
> NameNode 
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   host = 
> hdfs-namenode1/10.0.0.60  
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   args = 
> [-format, -nonInteractive] 
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   version = 
> 3.1.2 
> adss_hdfs-namenode1.1.ybygx5r50v8y@Harvester| STARTUP_MSG:   classpath = 
> /opt/hadoop-latest/etc/hadoop:/opt/hadoop-latest/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerby-config-1.0.1.jar:/opt/hadoop-latest/sha
> re/hadoop/common/lib/metrics-core-3.2.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-util-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-lang3-3.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/opt/hadoop-lates
> t/share/hadoop/common/lib/jetty-util-9.3.24.v20180605.jar:/opt/hadoop-latest/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/had
> oop-latest/share/hadoop/common/lib/commons-compress-1.18.jar:/opt/hadoop-latest/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerby-util-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop-l
> atest/share/hadoop/common/lib/accessors-smart-1.2.jar:/opt/hadoop-latest/share/hadoop/common/lib/httpcore-4.4.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:
> /opt/hadoop-latest/share/hadoop/common/lib/jetty-server-9.3.24.v20180605.jar:/opt/hadoop-latest/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-latest/share/hadoop/common/lib/jersey-servlet-1.19.jar:/opt/hadoop-latest/share/hadoop/common/lib/jackson-databi
> nd-2.7.8.jar:/opt/hadoop-latest/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/opt/hadoop-latest/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/opt/hadoop-latest/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop-latest/share/hadoop/common/lib/js
> on-smart-2.3.jar:/opt/hadoop-latest/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/opt/hadoop-latest/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/opt/hadoop-latest/share/hadoop/common/lib/curator-framework-2.13.0.jar:/opt/hadoop-latest/share/hadoop/co
> mmon/lib/netty-3.10.5.Final.jar:/opt/hadoop-latest/share/hadoop/common/lib/re2j-1.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/opt/hadoop-latest/share/hadoop/common/lib/jersey-json-1.19.jar:/opt/hadoop-latest/share/hadoop/common/lib/
> kerb-server-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-core-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/jersey-core-1.19.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerby
> -asn1-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/snappy-java-1.0.5.jar:/opt/hadoop-latest/share/hadoop/common/lib/guava-11.0.2.jar:/opt/hadoop-latest/share/hadoop/common/lib/httpclient-4.5.2.jar:/opt/hadoop-latest/share/hadoop/common/lib/stax2-api-3.1.
> 4.jar:/opt/hadoop-latest/share/hadoop/common/lib/kerb-common-1.0.1.jar:/opt/hadoop-latest/share/hadoop/common/lib/asm-5.0.4.jar:/opt/hadoop-latest/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-latest/share/hadoop/common/lib/jsr311-api-1.1.1.jar
> 

Re: Memory error during hdfs dfs -format

2019-02-19 Thread Arpit Agarwal
Hi Dmitry, HDFS commands in Apache Hadoop do not launch any python
processes. You'll have to find out more about what the process/command is.


On Tue, Feb 19, 2019, 11:23 Dmitry Goldenberg  Hi,
>
> We've got a task in Ansible which returns a MemoryError during HDFS
> installation on a box with 64 GB memory total, 30 GB free at the moment.
>
> It appears that during the execution of the hdfs dfs -format command, a
> Python process is spawned which gobbles up ~32ish GB of memory and then the
> Ansible deploy fails.
>
> Any ideas as to how we could curtail / manage memory consumption better?
>
> Thanks
>


Re: Re[2]: Error: Could not find or load main class when Running Hadoop

2018-10-14 Thread Arpit Agarwal
You don't need to configure the classpath if you just extracted the Hadoop 
tarball under /usr/local/. 

The only required setting is JAVA_HOME, everything else should be inferred. Is 
it possible some directories got moved around after unpacking?


On 2018/10/14, 9:32 AM, "razo@"  wrote:

So how should I configure it?
I looked over here 
https://stackoverflow.com/questions/28260653/where-is-the-classpath-set-for-hadoop
 to Siva's answer and the output of ($HADOOP+HOME/bin/hadoop classpath) is :

hadoop2@master:~$ $HADOOP_HOME/bin/hadoop classpath

/usr/local/hadoop-2.9.1/etc/hadoop:/usr/local/hadoop-2.9.1/share/hadoop/common/lib/*:/usr/local/hadoop-2.9.1/share/hadoop/common/*:/usr/local/hadoop-2.9.1/share/hadoop/hdfs:/usr/local/hadoop-2.9.1/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.9.1/share/hadoop/hdfs/*:/usr/local/hadoop-2.9.1/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.9.1/share/hadoop/yarn/*:/usr/local/hadoop-2.9.1/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.9.1/share/hadoop/mapreduce/*:/usr/local/hadoop-2.9.1/contrib/capacity-scheduler/*.jar

But the problem stayed the same. 
How should I configure the classpath to be what you suggested?

P.S 
hadoop2@master:~$ $HADOOP_HOME/bin/hadoop jar 
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar 
org.apache.hadoop.examples.Grep input output 'dfs[a-z.]+'
Error: Could not find or load main class org.apache.hadoop.util.RunJar


On 2018/10/14 15:55:06, ITD  wrote: 
> Try and check your classpath, it looks a bit strange:
> 
> /usp/hdfs: what's this?
> 
> Working classpath is:
> /usr/local/hadoop/etc/hadoop:\
> /usr/local/hadoop/share/hadoop/common/lib/*:\
> /usr/local/hadoop/share/hadoop/common/*:\
> /usr/local/hadoop/share/hadoop/hdfs:\
> /usr/local/hadoop/share/hadoop/hdfs/lib/*:\
> /usr/local/hadoop/share/hadoop/hdfs/*:\
> /usr/local/hadoop/share/hadoop/mapreduce/lib/*:\
> /usr/local/hadoop/share/hadoop/mapreduce/*:\
> /usr/local/hadoop/share/hadoop/yarn:\
> /usr/local/hadoop/share/hadoop/yarn/lib/*:\
> /usr/local/hadoop/share/hadoop/yarn/*
> 
> 
> 
> >Воскресенье, 14 октября 2018, 18:26 +03:00 от r...@post.bgu.ac.il 
:
> >
> >I tried, I still get the same error.
> >
> >On 2018/10/14 15:18:57, ITD < itdir...@mail.ru.INVALID > wrote: 
> >> Class names are case-sensitive in Java, so try
> >> 
> >> $HADOOP_HOME/bin/hadoop jar 
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar Grep 
input output 'dfs[a-z.]+'
> >> 
> >> >Воскресенье, 14 октября 2018, 18:04 +03:00 от Or Raz < 
r...@post.bgu.ac.il >:
> >> >
> >> >I am using Hadoop 2.9.1 standalone (the folder I am using is after 
successful compilation of the source code) and whenever I run a Hadoop command 
such as (where$HADOOP_HOME= /usr/local/hadoop, the directory of Hadoop)
> >> >$HADOOP_HOME/bin/hadoop jar 
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep 
input output 'dfs[a-z.]+'
> >> >I get this error :
> >> >>Error: Could not find or load main class 
org.apache.hadoop.util.RunJar
> >> >At first, I was thinking that I am missing some environment variables 
but are they necessary for the standalone case? Why did I get this error? (I 
even replaced with another example, I believe it is a classpath problem)
> >> >
> >> >hadoop2@master:/usr/local/hadoop-2.9.1$ bin/hadoop classpath
> >> 
>/usr/local/hadoop-2.9.1/etc/hadoop:/usr/local/hadoop-2.9.1/share/hadoop/common/lib/*:/usr/local/hadoop-2.9.1/share/hadoop/common/*:/usr/local/hadoop-2.9.1/share/hadoop/hdfs:/usp/hdfs/*:/usr/local/hadoop-2.9.1/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.9.1/share/hadoop/yarn/*:/usr/local/hadoop-2.9.1/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.9.jar
> >> 
> >> 
> >> 
> >> 
> >
> >-
> >To unsubscribe, e-mail:  user-unsubscr...@hadoop.apache.org
> >For additional commands, e-mail:  user-h...@hadoop.apache.org
> >
> 
> 
> 
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org





Re: SFTPConnectionPool connections leakage

2018-08-21 Thread Arpit Agarwal
Hi Mikhail,

There’s two ways to contribute a fix:

  1.  Attach a patch file to the Jira and click “Submit Patch”. I’ve made you a 
contributor and assigned it to you.
  2.  Submit a GitHub pull request with the Jira key in the title. Not tried 
this in a while and not sure it still works.


From: Mikhail Pryakhin 
Date: Tuesday, August 21, 2018 at 2:40 PM
To: 
Subject: SFTPConnectionPool connections leakage

Hi,
I’ve come across a connection leakage while using SFTPFileSystem. Methods of 
SFTPFileSystem operate on poolable ChannelSftp instances, thus some methods of 
SFTPFileSystem are chained together resulting in establishing multiple 
connections to the SFTP server to accomplish one compound action, those methods 
are:

mkdirs method[1]. The public mkdirs method acquires a new ChannelSftp from the 
pool and then recursively creates directories, checking for the directory 
existence beforehand by calling the method exists[2] which delegates to the 
getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
ends up in returning the FilesStatus instance [4]. The resource leakage occurs 
in the method getWorkingDirectory which calls the getHomeDirectory method [5] 
which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.

open method [6]. This method returns an instance of FSDataInputStream which 
consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp 
instance back to the pool but instead it closes it[7]. This leads to 
establishing another connection to an SFTP server when the next method is 
called on the FileSystem instance.

I’ve issued a Jira ticket https://issues.apache.org/jira/browse/HADOOP-15358, 
and fixed the connection leakage issue.
Could I create a pull request to merge the fix?

[1] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
[2] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
[3] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
[4] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
[5] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
[6] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
[7] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123


Kind Regards,
Mike Pryakhin




Re: Hadoop Problem: Setup a Hadoop Multinode Cluster (2 Nodes)

2018-02-23 Thread Arpit Agarwal
Looks for errors in your DataNode log file. It’s in $HADOOP_HOME/logs by 
default.



On Feb 23, 2018, at 12:55 AM, Butler, RD, Mnr <17647...@sun.ac.za> 
<17647...@sun.ac.za> wrote:

To whom it may concern

I have two computers, the one I work on (CENTOS installed) and a second 
computer (also CENTOS (server), to act as the datanode), both not in a VM 
environment. I want to create a multi-node cluster with these computers. I have 
directly connected the computers together to test for possible network issues 
(ports etc.) and found that not to be the issue. I also used the guide by
https://tecadmin.net/set-up-hadoop-multi-node-cluster-on-centos-redhat/# and


How to Set Up Hadoop Multi-Node Cluster on CentOS 
7/6
tecadmin.net
Our earlier article describing to how to setup single node cluster. This 
article will help you to Set Up Hadoop Multi-Node Cluster on CentOS/RHEL 7/6.




https://dwbi.org/etl/bigdata/183-setup-hadoop-cluster. I have created a 
'hadoop' user on both machines, with permissions, and established a 
password-less SSH access between both.
How to Setup Hadoop Multi Node Cluster - Step By 
Step
dwbi.org
Step by Step Guide to Setting up Hadoop Cluster with namenodes and datanodes




The hostname for the computers are:
1. NameNode (main computer): master
2. DataNode (the server): datanode1

The /etc/hosts file I have as(showing 'computerIP' in place of the actual IP's):
computerIP master
computerIP datanode1

My .xml file configurations on the NameNode are:
1. core-site.xml:


fs.defaultFS
hdfs://master:8020/


io.file.buffer.size
131072


2. hdfs-site.xml:


dfs.namenode.name.dir
file:/opt/volume/namenode


dfs.datanode.data.dir
file:/opt/volume/datanode


dfs.namenode.checkpoint.dir
file:/opt/volume/namesecondary


dfs.replication

fs.defaultFS
hdfs://master:8020/


io.file.buffer.size
131072


3. mapred-site.xml:


mapreduce.framework.name
yarn


mapreduce.jobhistory.address
master:10020


mapreduce.jobhistory.webapp.address
master:19888


yarn.app.mapreduce.am.staging-dir
/user/app


mapred.child.java.opts
-Djava.security.egd=file:/dev/../dev/urandom


4. yarn-site.xml:


yarn.resourcemanager.hostname
master


yarn.resourcemanager.bind-host
0.0.0.0


yarn.nodemanager.bind-host
0.0.0.0


yarn.nodemanager.aux-services
mapreduce_shuffle


yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler


yarn.log-aggregation-enable
true


yarn.nodemanager.local-dirs
file:/opt/volume/local


yarn.nodemanager.log-dirs
file:/opt/volume/yarn/log


yarn.nodemanager.remote-app-log-dir
hdfs://master:8020/var/log/hadoop-yarn/apps


5. JAVA_HOME (Where java is located):
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
6. Slaves file:
datanode1
7. Masters file:
master


My .bashrc file is as follows:
export JAVA_HOME=/usr/lib/java-1.8.0
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.

export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.egd=file:/dev/../dev/urandom"

The permissions are as follows on both machines (from terminal):

[hadoop@master hadoop]$ ls -al /opt
total 0
drwxr-xr-x. 5 hadoop hadoop 44 Feb 15 16:05 .
dr-xr-xr-x. 17 root root 242 Feb 21 11:38 ..
drwxr-xr-x. 3 hadoop hadoop 53 Feb 15 16:00 hadoop
drwxr-xr-x. 2 hadoop hadoop 6 Sep 7 01:11 rh
drwxr-xr-x. 7 hadoop hadoop 84 Feb 20 11:27 volume
For the DataNode:
[hadoop@datanode1 ~]$ ls -al /opt
total 0
drwxrwxrwx. 4 hadoop hadoop 34 Feb 20 11:06 .
dr-xr-xr-x. 17 root root 242 Feb 19 16:13 ..
drwxr-xr-x. 3 hadoop hadoop 53 Feb 20 11:07 hadoop
drwxrwxrwx. 5 hadoop hadoop 59 Feb 21 09:53 volume


So when I go to format the namenode: hdfs namenode -format, I get that the 
NameNode is formatted on the 'master'.
And then I go start the system, $HADOOP_HOME/sbin/start-dfs.sh and get the 
following output:

[hadoop@master hadoop]$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to 
/opt/hadoop/hadoop-2.8.3/logs/hadoop-hadoop-namenode-master.out
datanode1: starting datanode, logging to 
/opt/hadoop/hadoop-2.8.3/logs/hadoop-hadoop-datanode-datanode1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to 
/opt/hadoop/hadoop-2.8.3/logs/hadoop-hadoop-secondarynamenode-master.out
Showing that the datanode is started, yet i go to the 50070 terminal to find 

Re: Parameter repeated twice in hdfs-site.xml

2017-11-30 Thread Arpit Agarwal
That looks confusing, usability-wise.


  *   A related question is how can I see the parameters with which a datanode 
was launched in order to check these values


You can navigate to the conf servlet of the DataNode web UI e.g. 
http://w.x.y.z:50075/conf

From: Alvaro Brandon 
Date: Thursday, November 30, 2017 at 5:33 AM
To: "user@hadoop.apache.org" 
Subject: Parameter repeated twice in hdfs-site.xml

What will happen if I have a repeated parameter in the configuration file for 
HDFS?. You can see here an example of a file where the parameters in bold are 
repeated with contradictory values: right and false.
I need to know because I'm using a Docker image that builds the configuration 
file this way, through environmental variables and I want to know if it will 
create any conflicts. A related question is how can I see the parameters with 
which a datanode was launched in order to check these values



dfs.datanode.use.datanode.hostnamefalse
dfs.datanode.use.datanode.ip.hostnamefalse
dfs.namenode.datanode.registration.ip-hostname-checkfalse
dfs.datanode.data.dirfile:///hadoop/dfs/data
dfs.client.use.datanode.hostnamefalse
dfs.namenode.rpc-bind-host0.0.0.0
dfs.namenode.servicerpc-bind-host0.0.0.0
dfs.namenode.http-bind-host0.0.0.0
dfs.namenode.https-bind-host0.0.0.0
dfs.client.use.datanode.hostnametrue
dfs.datanode.use.datanode.hostnametrue






Re: Lots of warning messages and exception in namenode logs

2017-06-22 Thread Arpit Agarwal
Hi Omprakash,

Your description suggests DataNodes cannot send timely reports to the NameNode. 
You can check it by looking for ‘stale’ DataNodes in the NN web UI when this 
situation is occurring. A few ideas:


  *   Try increasing the NameNode RPC handler count a bit (set 
dfs.namenode.handler.count to 20 in hdfs-site.xml).
  *   Enable the NameNode service RPC port. This requires downtime and 
reformatting the ZKFC znode.
  *   Search for JvmPauseMonitor messages in your service logs. If you see any, 
try increasing JVM heap for that service.
  *   Enable debug logging as suggested here:

2017-06-21 12:11:30,626 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
org.apache.hadoop.net.NetworkTopology


From: omprakash 
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' 
Cc: 'user' 
Subject: RE: Lots of warning messages and exception in namenode logs

Hi Ravi,

Pasting below my core-site and hdfs-site  configurations. I have kept bare 
minimal configurations for my cluster.  The cluster started fine and I was able 
to put couple of 100K files on hdfs but then when I checked the logs there were 
errors/Exceptions. After restart of datanodes they work well for few thousand 
files but same problem again.  No idea what is wrong.

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

I thought it may be due to space quota on datanodes but here is the output of 
hdfs dfs -report. Looks fine to me

$ hdfs dfsadmin -report

Configured Capacity: 42005069824 (39.12 GB)
Present Capacity: 38085839568 (35.47 GB)
DFS Remaining: 34949058560 (32.55 GB)
DFS Used: 3136781008 (2.92 GB)
DFS Used%: 8.24%
Under replicated blocks: 141863
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-
Live datanodes (2):

Name: 192.168.9.174:50010 (node5)
Hostname: node5
Decommission Status : Normal
Configured Capacity: 21002534912 (19.56 GB)
DFS Used: 1764211024 (1.64 GB)
Non DFS Used: 811509424 (773.92 MB)
DFS Remaining: 17067913216 (15.90 GB)
DFS Used%: 8.40%
DFS Remaining%: 81.27%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Wed Jun 21 14:38:17 IST 2017


Name: 192.168.9.225:50010 (node4)
Hostname: node5
Decommission Status : Normal
Configured Capacity: 21002534912 (19.56 GB)
DFS Used: 1372569984 (1.28 GB)
Non DFS Used: 658353792 (627.86 MB)
DFS Remaining: 17881145344 (16.65 GB)
DFS Used%: 6.54%
DFS Remaining%: 85.14%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jun 21 14:38:19 IST 2017

core-site.xml




  fs.defaultFS
  hdfs://hdfsCluster


  dfs.journalnode.edits.dir
  /mnt/hadoopData/hadoop/journal/node/local/data



hdfs-site.xml




dfs.replication
2


  dfs.name.dir
file:///mnt/hadoopData/hadoop/hdfs/namenode


  dfs.data.dir
file:///mnt/hadoopData/hadoop/hdfs/datanode


dfs.nameservices
hdfsCluster


  dfs.ha.namenodes.hdfsCluster
  nn1,nn2



  dfs.namenode.rpc-address.hdfsCluster.nn1
  node1:8020


  dfs.namenode.rpc-address.hdfsCluster.nn2
  node22:8020



  dfs.namenode.http-address.hdfsCluster.nn1
  node1:50070


  dfs.namenode.http-address.hdfsCluster.nn2
  node2:50070



  dfs.namenode.shared.edits.dir
  
qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster


  dfs.client.failover.proxy.provider.hdfsCluster
  
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider


   ha.zookeeper.quorum
   node1:2181,node2:2181,node3:2181,node4:2181,node5:2181


dfs.ha.fencing.methods
sshfence


dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa


   dfs.ha.automatic-failover.enabled
   true




From: Ravi Prakash [mailto:ravihad...@gmail.com]
Sent: 22 June 2017 02:38
To: omprakash 
Cc: user 
Subject: Re: Lots of warning messages and exception in namenode logs

Hi Omprakash!
What is your default replication set to? What kind of disks do your datanodes 
have? Were you able to start a cluster with a simple configuration before you 
started tuning it?
HDFS tries to create the default number of replicas for a block on different 
datanodes. The Namenode tries to give a list of datanodes that the client can 
write replicas of the block to. If the Namenode is not able to construct a list 
with adequate number of datanodes, 

Re: Hadoop Installation on Windows 7 in 64 bit

2016-07-18 Thread Arpit Agarwal
Hi Vinodh,

Are there any spaces in your JAVA_HOME path? If so you need to use the short 
(8.3) path. E.g. c:\progra~1\java (assuming you haven’t done so already).


From: Rakesh Radhakrishnan 
Date: Sunday, July 17, 2016 at 11:03 PM
To: Vinodh Nagaraj 
Cc: "user@hadoop.apache.org" 
Subject: Re: Hadoop Installation on Windows 7 in 64 bit

>>>I couldn't find folder conf in hadoop home.

Could you check %HADOOP_HOME%/etc/hadoop/hadoop-env.cmd path. May be, 
U:/Desktop/hadoop-2.7.2/etc/hadoop/hadoop-env.cmd location.

Typically HADOOP_CONF_DIR will be set to %HADOOP_HOME%/etc/hadoop. Could you 
check "HADOOP_CONF_DIR" env variable value, location of the hadoop cluster 
configuration.

Regards,
Rakesh


On Mon, Jul 18, 2016 at 10:47 AM, Vinodh Nagaraj 
> wrote:
Hi All,

I tried to install Hadoop hadoop-2.7.2 on Windows 7 in 64 bit machine.
Java version 1.8.set path variables. It works fine.

Trying to execute start-all.cmd.got. Got the below error. I couldn't find 
folder conf in hadoop home.

U:\Desktop\hadoop-2.7.2\sbin>start-all.cmd
This script is Deprecated. Instead use start-dfs.cmd and start-yarn.cmd
Error: JAVA_HOME is incorrectly set.
   Please update U:\Desktop\hadoop-2.7.2\conf\hadoop-env.cmd


Any Suggestions.


Thanks,
Vinodh.N



Re: Authentication and security with hadoop

2016-07-13 Thread Arpit Agarwal
Hi Ravi,

Kerberos is the only supported mechanism for strong identity. Most Hadoop 
access controls are easily bypassed without Kerberos authentication.

Kerberos setup can be difficult. Most Kerberos complications arise with 
multi-homed hosts or if DNS/reverse DNS is broken. If you run into specific 
Kerberos operation issues you can ask for answers on this DL.

Apache Hadoop 2.7.3 will have improved documentation on Kerberos setup. 
Meanwhile you can find the updated docs here:
https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-common-project/hadoop-common/src/site/markdown/SecureMode.md#Multihoming


From: ravi teja 
Date: Wednesday, July 13, 2016 at 5:46 AM
To: "user@hadoop.apache.org" 
Subject: Authentication and security with hadoop

Hi Community,

We wanted to have authentication on hadoop, means want to make sure the user is 
what he claims to be and doesn't proxy another users using env variables.

From many links , I see that the default choice is kerberos with hadoop.
And as far i understand ,I see that ranger is more like a central place to 
manage the acls on directories and it doesn't involve in authentication.

And the information online is pretty old, could get any latest information on 
the security auth.

I wanted to know if there is other way than kerberos for providing this 
authentication layer?
Because kerberos had many operation problems while using with HDFS and now we 
no longer use it.

Thanks in advance,
Ravi


Re: Hi

2016-05-06 Thread Arpit Agarwal
Hi, I assume you are asking about multi-node cluster setup on Windows. I don't 
recommend using Cygwin or ssh server on Windows as I have never tried it out 
myself and cannot be sure it works.

You can start services manually on each node. It is not as convenient as the 
Linux start-*.sh/stop-*.sh scripts which operate on all nodes.

From: Abi >
Date: Friday, May 6, 2016 at 11:15 AM
To: "a...@apache.org" 
>, 
"user@hadoop.apache.org" 
>
Subject: Re: Hi

Arpit,
Hadoop in TRUE multi cluster mode does not have any installation instructions. 
That means these slaves are on a separate physical machine than the master.

I have a few questions


1. Is cygwin necessary to make it work with SSH

2. Can it be done without SSH ?

3. Are there any instructions for this? The reason I ask is because modifying 
the Linux instruction on cygwin is really hard as scripts do not work out of 
the box and lot of modifications are required.

On May 5, 2016 10:54:39 PM EDT, Abi 
@gmail.com> wrote:


On May 5, 2016 5:31:25 PM EDT, Abi 
@gmail.com> wrote:
Arpit,
Hadoop in TRUE multi cluster mode does not have any installation instructions. 
That means these slaves are on a separate physical machine than the master.

I have a few questions


1. Is cygwin necessary to make it work with SSH

2. Can it be done without SSH ?

3. Are there any instructions for this? The reason I ask is because modifying 
the Linux instruction on cygwin is really hard as scripts do not work out of 
the box and lot of modifications are required.



Re: Unsubscribe footer for user@h.a.o messages

2015-11-09 Thread Arpit Agarwal
Yeah it’s not working. I updated the INFRA issue.

On Nov 8, 2015, at 7:53 AM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

The INFRA JIRA was closed 2 days ago.

But the following post from today still doesn't carry footer:
http://search-hadoop.com/m/uOzYthBKLf2YvP0O1

FYI

On Thu, Nov 5, 2015 at 7:33 PM, Arpit Agarwal 
<aagar...@hortonworks.com<mailto:aagar...@hortonworks.com>> wrote:
Created https://issues.apache.org/jira/browse/INFRA-10725


From: Vinayakumar B 
<vinayakumar...@huawei.com<mailto:vinayakumar...@huawei.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Thursday, November 5, 2015 at 5:15 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: RE: Unsubscribe footer for user@h.a.o<mailto:user@h.a.o> messages

+1,

Thanks Arpit

-Vinay

From: Brahma Reddy Battula [mailto:brahmareddy.batt...@hotmail.com]
Sent: Friday, November 06, 2015 8:27 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Unsubscribe footer for user@h.a.o<mailto:user@h.a.o> messages

+ 1 ( non-binding)..

Nice thought,Arpit..



Thanks And Regards
Brahma Reddy Battula


Subject: Re: Unsubscribe footer for user@h.a.o<mailto:user@h.a.o> messages
From: m...@hortonworks.com<mailto:m...@hortonworks.com>
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Date: Thu, 5 Nov 2015 21:23:41 +

+1 (non-binding)

On Nov 5, 2015, at 12:50 PM, Arpit Agarwal 
<aagar...@hortonworks.com<mailto:aagar...@hortonworks.com>> wrote:

Apache project mailing lists can add unsubscribe footers to messages. E.g. From 
spark-user.
https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3C5637830F.3070702%40uib.no%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3c5637830F.3070702%40uib.no%3e>

If no one objects I will file an INFRA ticket to add the footer to 
user@h.a.o<mailto:user@h.a.o>. Unsubscribe requests are less frequent on the 
dev mailing lists so we can leave those alone.





Unsubscribe footer for user@h.a.o messages

2015-11-05 Thread Arpit Agarwal
Apache project mailing lists can add unsubscribe footers to messages. E.g. From 
spark-user.
https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3C5637830F.3070702%40uib.no%3E

If no one objects I will file an INFRA ticket to add the footer to user@h.a.o. 
Unsubscribe requests are less frequent on the dev mailing lists so we can leave 
those alone.


Re: Unsubscribe footer for user@h.a.o messages

2015-11-05 Thread Arpit Agarwal
Created https://issues.apache.org/jira/browse/INFRA-10725


From: Vinayakumar B 
<vinayakumar...@huawei.com<mailto:vinayakumar...@huawei.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Thursday, November 5, 2015 at 5:15 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: RE: Unsubscribe footer for user@h.a.o<mailto:user@h.a.o> messages

+1,

Thanks Arpit

-Vinay

From: Brahma Reddy Battula [mailto:brahmareddy.batt...@hotmail.com]
Sent: Friday, November 06, 2015 8:27 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Unsubscribe footer for user@h.a.o<mailto:user@h.a.o> messages

+ 1 ( non-binding)..

Nice thought,Arpit..



Thanks And Regards
Brahma Reddy Battula


Subject: Re: Unsubscribe footer for user@h.a.o<mailto:user@h.a.o> messages
From: m...@hortonworks.com<mailto:m...@hortonworks.com>
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Date: Thu, 5 Nov 2015 21:23:41 +

+1 (non-binding)

On Nov 5, 2015, at 12:50 PM, Arpit Agarwal 
<aagar...@hortonworks.com<mailto:aagar...@hortonworks.com>> wrote:

Apache project mailing lists can add unsubscribe footers to messages. E.g. From 
spark-user.
https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3C5637830F.3070702%40uib.no%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3c5637830F.3070702%40uib.no%3e>

If no one objects I will file an INFRA ticket to add the footer to 
user@h.a.o<mailto:user@h.a.o>. Unsubscribe requests are less frequent on the 
dev mailing lists so we can leave those alone.



Re: Documentation inconsistency about append write in HDFS

2015-08-03 Thread Arpit Agarwal
Hi Thanh,

Thanks for bringing it up. Append is available in 2.x releases as you pointed 
out and is production-ready.

Can you please file a doc bug at https://issues.apache.org/jira/browse/HADOOP?


On Aug 2, 2015, at 8:49 PM, Thanh Hong Dai 
hdth...@tma.com.vnmailto:hdth...@tma.com.vn wrote:

In the latest version of the documentation 
(http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model
 and also documentation for version 2.x), it’s mentioned that “A file once 
created, written, and closed need not be changed. “ and “There is a plan to 
support appending-writes to files in the future.”

However, as far as I know, HDFS has supported append write since 0.21, based on 
this JIRA (https://issues.apache.org/jira/browse/HDFS-265) and the old version 
of the documentation in 2012 
(https://web.archive.org/web/20121221171824/http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncshttps://web.archive.org/web/20121221171824/http:/hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncs)

Various posts on the Internet also suggests that append write has been 
available in HDFS, and will always be available in Hadoop version 2 branch.

Can we update the documentation to reflect the most recent change? (Or will 
append write be deprecated or is it not ready for production use?)



Re: build hadoop-2.7.0 with win8.1

2015-06-08 Thread Arpit Agarwal
Hi, I have not seen that error before. I'd add the '-x' option to the sh 
command in the pom to get diagnostics.

exec executable=${shell-executable} 
dir=${project.build.directory} failonerror=true
  arg line=./dist-copynativelibs.sh/

I am sorry you are running into these issues. FWIW I documented my build steps 
on the wiki.
https://wiki.apache.org/hadoop/Hadoop2OnWindows
Most of us developers are using Windows Server 2008 R2 for building.

From: vergilchi
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Monday, June 8, 2015 at 1:55 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: RE: build hadoop-2.7.0 with win8.1

thnak you first! it really works!
but now i I got a new problem.may be raised by dist-copynativelibs.sh .
i see the dist-copynativelibs.sh has been generate as well as 
hadoop-common-2.7.0.jar,
hadoop-common-2.7.0-tests.jar.i am sure i have sh, mkdir, rm, cp, tar, gzip 
in path Environmental variable .i use jdk 1.8 and visual studio 2010 
professional. is that caused by
 Win8.1 the system I use?when i execute bash --version :
bash --version
GNU bash, version 3.1.23(6)-release (i686-pc-msys)
Copyright (C) 2005 Free Software Foundation, Inc.


the error trace is this:

main:
 [exec] ./dist-copynativelibs.sh: option not availible on this NT BASH relea
se
 [exec] ./dist-copynativelibs.sh: fork: Bad file descriptor
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:
run (pre-dist) on project hadoop-common: An Ant BuildException has occured: exec
 returned: 128
[ERROR] around Ant part ...exec failonerror=true dir=C:\Users\vergil-chi\Des
ktop\hadoop-2.7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\targe
t executable=sh... @ 41:155 in C:\Users\vergil-chi\Desktop\hadoop-2.7.0-src\
hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target\antrun\build-main.xm
l



Subject: Re: build hadoop-2.7.0 with win8.1
From: aagar...@hortonworks.commailto:aagar...@hortonworks.com
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Sat, 6 Jun 2015 20:17:08 +

Hi vergilchi,

The Windows equivalent of the 'native' profile is 'native-win'. You will also 
need JDK 1.7.

From: vergilchi
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Saturday, June 6, 2015 at 8:58 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: build hadoop-2.7.0 with win8.1

 hi
when i try to Building hadoop-2.7.0-src on Windows,i got a problem
i followed the BUILDING.txt,and i have all of this:

* Windows System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer
* Windows SDK or Visual Studio 2010 Professional
* Unix command-line tools from GnuWin32 or Cygwin: sh, mkdir, rm, cp, tar, gzip
* zlib headers (if building native code bindings for zlib)
when a execute mvn package -Pdist,native,docs -DskipTests 
-Dadditionalparam=-Xdoclint:none
i got this:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:
run (make) on project hadoop-common: An Ant BuildException has occured: exec ret
urned: 2
[ERROR] around Ant part ...exec failonerror=true dir=C:\Users\vergil-chi\Des
ktop\hadoop-2.7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\targe
t/native executable=make... @ 7:164 in C:\Users\vergil-chi\Desktop\hadoop-2.
7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target\antrun\build
-main.xml
[ERROR] - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o
rg.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-com
mon: An Ant BuildException has occured: exec returned: 2
around Ant part ...exec failonerror=true dir=C:\Users\vergil-chi\Desktop\had
oop-2.7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target/native
 executable=make... @ 7:164 in C:\Users\vergil-chi\Desktop\hadoop-2.7.0-src\
hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target\antrun\build-main.xm
l
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:216)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:116)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:80)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThre
adedBuilder.build(SingleThreadedBuilder.java:51)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl
eStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at 

Re: build hadoop-2.7.0 with win8.1

2015-06-06 Thread Arpit Agarwal
Hi vergilchi,

The Windows equivalent of the 'native' profile is 'native-win'. You will also 
need JDK 1.7.

From: vergilchi
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Saturday, June 6, 2015 at 8:58 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: build hadoop-2.7.0 with win8.1

 hi
when i try to Building hadoop-2.7.0-src on Windows,i got a problem
i followed the BUILDING.txt,and i have all of this:

* Windows System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer
* Windows SDK or Visual Studio 2010 Professional
* Unix command-line tools from GnuWin32 or Cygwin: sh, mkdir, rm, cp, tar, gzip
* zlib headers (if building native code bindings for zlib)
when a execute mvn package -Pdist,native,docs -DskipTests 
-Dadditionalparam=-Xdoclint:none
i got this:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:
run (make) on project hadoop-common: An Ant BuildException has occured: exec ret
urned: 2
[ERROR] around Ant part ...exec failonerror=true dir=C:\Users\vergil-chi\Des
ktop\hadoop-2.7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\targe
t/native executable=make... @ 7:164 in C:\Users\vergil-chi\Desktop\hadoop-2.
7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target\antrun\build
-main.xml
[ERROR] - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o
rg.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-com
mon: An Ant BuildException has occured: exec returned: 2
around Ant part ...exec failonerror=true dir=C:\Users\vergil-chi\Desktop\had
oop-2.7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target/native
 executable=make... @ 7:164 in C:\Users\vergil-chi\Desktop\hadoop-2.7.0-src\
hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target\antrun\build-main.xm
l
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:216)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:116)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:80)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThre
adedBuilder.build(SingleThreadedBuilder.java:51)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl
eStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Laun
cher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.jav
a:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(La
uncher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:
356)
Caused by: org.apache.maven.plugin.MojoExecutionException: An Ant BuildException
 has occured: exec returned: 2
around Ant part ...exec failonerror=true dir=C:\Users\admin\Desktop\had
oop-2.7.0-src\hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target/native
 executable=make... @ 7:164 in C:\Users\admin\Desktop\hadoop-2.7.0-src\
hadoop-2.7.0-src\hadoop-common-project\hadoop-common\target\antrun\build-main.xm
l
at org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:355
)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(Default
BuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:208)
... 20 more
Caused by: C:\Users\vergil-chi\Desktop\hadoop-2.7.0-src\hadoop-2.7.0-src\hadoop-
common-project\hadoop-common\target\antrun\build-main.xml:7: exec returned: 2
at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:646)
at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:672)
at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:498)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at 

Re: ack with firstBadLink as 192.168.1.12:50010?

2015-06-03 Thread Arpit Agarwal
I recall seeing this error due to a network misconfiguration. You may want to 
verify that IP addresses and host names are correctly setup.

From: Caesar Samsi
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wednesday, June 3, 2015 at 8:07 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: ack with firstBadLink as 192.168.1.12:50010?

I’ve just built my distributed cluster but am getting the following error when 
I try to use HDFS.

I’ve traced it by telnet to 192.168.1.12 50010 and it just waits there waiting 
for a connection but never happens.

If I telnet on that host using localhost (127.0.0.1) the telnet connection 
happens immediately.

What could be the cause?



hduser@hadoopmaster ~/hadoop $ hdfs dfs -copyFromLocal input input
15/06/03 20:03:36 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Got error, status message , ack with firstBadLink as 
192.168.1.12:50010
at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
15/06/03 20:03:36 INFO hdfs.DFSClient: Abandoning 
BP-101149352-192.168.1.10-1433386347922:blk_1073741829_1005
15/06/03 20:03:36 INFO hdfs.DFSClient: Excluding datanode 
DatanodeInfoWithStorage[192.168.1.12:50010,DS-1347a6fe-6bad-4df8-88cb-21378b847839,DISK]
15/06/03 20:03:36 WARN hdfs.DFSClient: Slow waitForAckedSeqno took 70947ms 
(threshold=3ms)


Re: What is the git clone URL for a stable apache hadoop release

2015-05-26 Thread Arpit Agarwal
Rongzheng,

The correct URL is https://git-wip-us.apache.org/repos/asf/hadoop.git

To target a stable release you can either checkout the corresponding git tag or 
download the source release from https://hadoop.apache.org/releases.html

You may see occasional test failures as some tests are flaky so feel free to 
file a Jira and attach test logs if the issue has not been reported already. If 
you see a large number of failures it could be specific to your setup.






On 5/26/15, 3:18 PM, rongzheng yan rongzheng@oracle.com wrote:

Hello,

I cloned a local Git repository from Apache repository from URL 
http://git.apache.org/hadoop.git. Before I did any change, I tried to 
build and run the tests, but got several test failures.

Is any test failure expected in Apache repository? From JIRA 
Hadoop-11636, it seems that there are some test failures left in the 
Apache repository. If this is true, where can I get the git clone URL 
for a stable release? (e.g. apache hadoop 2.7.0) Is a stable release 
clean, without any test failure?

Thanks in advance,

Rongzheng



Re: Building Hadoop on Windows, SDK 7.1, Error An Ant BuildException has occured: input file hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target\findbugsXml.xml does not exist

2015-04-07 Thread Arpit Agarwal
Hi Umesh,

I use the following command to generate a Windows package for testing:
mvn install package -Pdist -Pnative-win -Dtar -DskipTests=true 
-Dmaven.site.skip=true -Dmaven.javadoc.skip=true

I have not tried the findbugs goal on Windows so I can't say whether it will 
work.

From: Umesh Kant
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org, Umesh Kant
Date: Sunday, April 5, 2015 at 4:31 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org, Umesh Kant
Subject: Re: Building Hadoop on Windows, SDK 7.1, Error An Ant BuildException 
has occured: input file 
hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target\findbugsXml.xml 
does not exist

Forgot to mention, am using following Maven command

mvn package -X -Pdist -Pdocs -Psrc -Dtar -DskipTests -Pnative-win 
findbugs:findbugs

Thanks,
Umesh


From: Umesh Kant kantum...@yahoo.commailto:kantum...@yahoo.com
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Sent: Friday, April 3, 2015 11:35 PM
Subject: Building Hadoop on Windows, SDK 7.1, Error An Ant BuildException has 
occured: input file 
hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target\findbugsXml.xml 
does not exist

All,

I am trying to build hadoop 2.6.0 on Windows 7 64 bit, Windows 7.1 SDK. I have 
gone through Build.txt file and have did follow all the pre-requisites for 
build on windows. Still when I try to build, I am getting following error:

[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 04:35 min
[INFO] Finished at: 2015-04-03T23:16:57-04:00
[INFO] Final Memory: 123M/1435M
[INFO] 
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:
run (site) on project hadoop-common: An Ant BuildException has occured: input fi
le C:\H\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target\findbugsXml.
xml does not exist
[ERROR] around Ant part ...xslt in=C:\H\hadoop-2.6.0-src\hadoop-common-project
\hadoop-common\target/findbugsXml.xml style=C:\findbugs-3.0.1/src/xsl/default.
xsl out=C:\H\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/site/
findbugs.html/... @ 44:232 in C:\H\hadoop-2.6.0-src\hadoop-common-project\hado
op-common\target\antrun\build-main.xml
[ERROR] - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal o
rg.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project hadoop-com
mon: An Ant BuildException has occured: input file C:\H\hadoop-2.6.0-src\hadoop-
common-project\hadoop-common\target\findbugsXml.xml does not exist
around Ant part ...xslt in=C:\H\hadoop-2.6.0-src\hadoop-common-project\hadoop-
common\target/findbugsXml.xml style=C:\findbugs-3.0.1/src/xsl/default.xsl out
=C:\H\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/site/findbugs
.html/... @ 44:232 in C:\H\hadoop-2.6.0-src\hadoop-common-project\hadoop-commo
n\target\antrun\build-main.xml
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:216)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:116)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
ct(LifecycleModuleBuilder.java:80)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThre
adedBuilder.build(SingleThreadedBuilder.java:51)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl
eStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Laun
cher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.jav
a:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(La
uncher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:
356)
Caused by: org.apache.maven.plugin.MojoExecutionException: An 

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

2015-02-08 Thread Arpit Agarwal
Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. 
Take a look at the 'hdfs balancer' command which can be run as a separate 
administrative tool to rebalance data distribution across DataNodes.


From: Manoj Venkatesh manove...@gmail.commailto:manove...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Friday, February 6, 2015 at 11:34 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 
additional nodes were added later to increase disk and CPU capacity. What i see 
is that processing is shared amongst all the nodes whereas the storage is 
reaching capacity on the original 6 nodes whereas the newly added machines have 
relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so 
that all the nodes are equally utilized. I have checked for the configuration 
parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round 
Robin' or 'Available Space', are there any other configurations which need to 
be reviewed.

Thanks,
Manoj


Re: multihoming cluster

2015-01-20 Thread Arpit Agarwal
Hi Ingo,

HDFS requires some extra configuration for multihoming. These settings are
documented at:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html

I am not sure all these settings were supported prior to Apache Hadoop 2.4.
I recommend using 2.6 if you can.

Arpit

On Mon, Jan 19, 2015 at 11:56 PM, Thon, Ingo ingo.t...@siemens.com wrote:


 Dear List,

 I’m using Hadoop in a Multi-homed environment. Basically the Namenode, and
 Datanodes are connected via a special network for datatransfer
 10.xxx.xxx.xxx.
 I installed the Hadoop tool on a computer which can access the nodes in
 the hadoop cluster via a second network 192.168.xxx.xxx .
 I want to use this computer to copy data into HDFS. However, all
 operations which try to copy data directly onto the datanodes are failing.
 Basically I can do ls, mkdir and even copy empty files, however, commands
 like:
 hadoop fs -put d:/temp/* hdfs://192.168.namenode/user/me/to_load/
 are failing.
 As you can see in the hadoop tool output below the client is trying to
 access the datanodes via the IP addresses from the datatransfer network and
 not via the public second network.
 The strange thing in the configuration files on the namenode the parameter
 dfs.client.use.datanode.hostname is set to true. From my untestanding I,
 therefore, shouldn’t see the logline
 15/01/19 13:51:11 DEBUG hdfs.DFSClient: pipeline = 10.x.x.13:50010
 At all

 thanks in advance,
 Ingo Thon

 Output from hadoop command
 15/01/19 13:51:11 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from me sending #12
 15/01/19 13:51:11 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from thon_i got value #12
 15/01/19 13:51:11 DEBUG ipc.ProtobufRpcEngine: Call: addBlock took 0ms
 15/01/19 13:51:11 DEBUG hdfs.DFSClient: pipeline = 10.x.x.13:50010
 15/01/19 13:51:11 DEBUG hdfs.DFSClient: Connecting to datanode
 10.x.x.13:50010
 15/01/19 13:51:21 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from thon_i: closed
 15/01/19 13:51:21 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from thon_i: stopped, remaining connections 0
 15/01/19 13:51:32 INFO hdfs.DFSClient: Exception in createBlockOutputStream
 java.net.ConnectException: Connection timed out: no further information
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
 at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1526)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1328)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1281)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
 15/01/19 13:51:32 INFO hdfs.DFSClient: Abandoning
 BP-20yyy26-10.x.x.x-1415y790:blk_1074387723_646941
 15/01/19 13:51:32 DEBUG ipc.Client: The ping interval is 6 ms.



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: multihoming cluster

2015-01-20 Thread Arpit Agarwal
Also the log message you pointed out is somewhat misleading. The actual
connection attempt will respect dfs.client.use.datanode.hostname.

In createSocketForPipeline:
  static Socket createSocketForPipeline(final DatanodeInfo first,
  final int length, final DFSClient client) throws IOException {
final String dnAddr = first.getXferAddr(
client.getConf().connectToDnViaHostname);
if (DFSClient.LOG.isDebugEnabled()) {
  DFSClient.LOG.debug(Connecting to datanode  + dnAddr);
}
final InetSocketAddress isa = NetUtils.createSocketAddr(dnAddr);

The useful log message is this one:
15/01/19 13:51:11 DEBUG hdfs.DFSClient: Connecting to datanode
10.x.x.13:50010

A quick guess is that the slaves configuration file on your NN has 10.x IP
addresses instead of hostnames.

On Tue, Jan 20, 2015 at 7:49 PM, Arpit Agarwal aagar...@hortonworks.com
wrote:

 Hi Ingo,

 HDFS requires some extra configuration for multihoming. These settings are
 documented at:

 https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html

 I am not sure all these settings were supported prior to Apache Hadoop
 2.4. I recommend using 2.6 if you can.

 Arpit

 On Mon, Jan 19, 2015 at 11:56 PM, Thon, Ingo ingo.t...@siemens.com
 wrote:


 Dear List,

 I’m using Hadoop in a Multi-homed environment. Basically the Namenode,
 and Datanodes are connected via a special network for datatransfer
 10.xxx.xxx.xxx.
 I installed the Hadoop tool on a computer which can access the nodes in
 the hadoop cluster via a second network 192.168.xxx.xxx .
 I want to use this computer to copy data into HDFS. However, all
 operations which try to copy data directly onto the datanodes are failing.
 Basically I can do ls, mkdir and even copy empty files, however, commands
 like:
 hadoop fs -put d:/temp/* hdfs://192.168.namenode/user/me/to_load/
 are failing.
 As you can see in the hadoop tool output below the client is trying to
 access the datanodes via the IP addresses from the datatransfer network and
 not via the public second network.
 The strange thing in the configuration files on the namenode the
 parameter dfs.client.use.datanode.hostname is set to true. From my
 untestanding I, therefore, shouldn’t see the logline
 15/01/19 13:51:11 DEBUG hdfs.DFSClient: pipeline = 10.x.x.13:50010
 At all

 thanks in advance,
 Ingo Thon

 Output from hadoop command
 15/01/19 13:51:11 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from me sending #12
 15/01/19 13:51:11 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from thon_i got value #12
 15/01/19 13:51:11 DEBUG ipc.ProtobufRpcEngine: Call: addBlock took 0ms
 15/01/19 13:51:11 DEBUG hdfs.DFSClient: pipeline = 10.x.x.13:50010
 15/01/19 13:51:11 DEBUG hdfs.DFSClient: Connecting to datanode
 10.x.x.13:50010
 15/01/19 13:51:21 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from thon_i: closed
 15/01/19 13:51:21 DEBUG ipc.Client: IPC Client (7749777) connection to
 /192.168.xxx.xxx:8020 from thon_i: stopped, remaining connections 0
 15/01/19 13:51:32 INFO hdfs.DFSClient: Exception in
 createBlockOutputStream
 java.net.ConnectException: Connection timed out: no further information
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
 at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1526)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1328)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1281)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
 15/01/19 13:51:32 INFO hdfs.DFSClient: Abandoning
 BP-20yyy26-10.x.x.x-1415y790:blk_1074387723_646941
 15/01/19 13:51:32 DEBUG ipc.Client: The ping interval is 6 ms.





-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Error while creating hadoop package 2.6.0 with Maven 3.2.3

2014-12-17 Thread Arpit Agarwal
Hi Venkat, you will need sh.exe on your path. It is part of the GnuWin32
toolset.

See BUILDING.txt in the source tree for details.

On Wed, Dec 17, 2014 at 12:30 AM, Venkat Ramakrishnan 
venkat.archit...@gmail.com wrote:

 Hello,

 I am getting an error in 'Project Dist POM', while
 generating Hadoop 2.6.0 package with Maven 3.2.3.
 The error says it could not find 'sh'.  I am building the
 package on a windows 7 machine, so I am not sure how to fix this.

 Logs attached. Any help would be greatly appreciated!

 Thanks  Best Regards,
 Venkat Ramakrishnan.




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Running job issues

2014-08-27 Thread Arpit Agarwal
Susheel is right. I've fixed the typo on the wiki page.


On Wed, Aug 27, 2014 at 12:28 AM, Susheel Kumar Gadalay skgada...@gmail.com
 wrote:

 You have to use this command to format

 hdfs namenode –format

 not hdfs dfs -format

 On 8/27/14, Blanca Hernandez blanca.hernan...@willhaben.at wrote:
  Hi, thanks for your answers.
 
  Sorry, I forgot to add it, I couldn´t run the command neither:
 
  C:\development\tools\hadoop%HADOOP_PREFIX%\bin\hdfs dfs -format
  -format: Unknown command
 
  C:\development\tools\hadoopecho %HADOOP_PREFIX%
  C:\development\tools\hadoop
 
  By using –help command there is no format param.
  My Hadoop version: 2.4.0 (the latest version supported by mongo-hadoop).
 
  Best regards,
 
  Blanca
 
 
  Von: Arpit Agarwal [mailto:aagar...@hortonworks.com]
  Gesendet: Dienstag, 26. August 2014 21:39
  An: user@hadoop.apache.org
  Betreff: Re: Running job issues
 
  And the namenode does not even start: 14/08/26 12:01:09 WARN
  namenode.FSNamesystem: Encountered exception loading fsimage
  java.io.IOException: NameNode is not formatted.
 
  Have you formatted HDFS (step 3.4)?
 
  On Tue, Aug 26, 2014 at 3:08 AM, Blanca Hernandez
  blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.at
  wrote:
  Hi!
 
  I have just installed hadoop in my windows x64 machine.l followd
 carefully
  the instructions in https://wiki.apache.org/hadoop/Hadoop2OnWindows but
 in
  the
  3.5
 https://wiki.apache.org/hadoop/Hadoop2OnWindows%20but%20in%20the%203.5
  and 3.6 points I have some problems I can not handle.
 
  %HADOOP_PREFIX%\sbin\start-dfs.cmd
 
 
  The datanode can no connect: 14/08/26 12:01:30 WARN datanode.DataNode:
  Problem connecting to server:
  0.0.0.0/0.0.0.0:19000http://0.0.0.0/0.0.0.0:19000
  And the namenode does not even start: 14/08/26 12:01:09 WARN
  namenode.FSNamesystem: Encountered exception loading fsimage
  java.io.IOException: NameNode is not formatted.
 
  Trying to get running the mapreduce-example indicated in the pont 3.6, I
 get
  the connection exception again. Tha brings me to the
  http://wiki.apache.org/hadoop/ConnectionRefused page where the
 exception in
  explained. So I guess I have some misunderstandings with the
 configuration.
  I tried to find information about that and found the page
  http://wiki.apache.org/hadoop/HowToConfigure but is still not very
 clear for
  me.
 
  I attach my config files, the ones I modified maybe you can help me out…
 
  Many thanks!!
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified that
 any
  printing, copying, dissemination, distribution, disclosure or forwarding
 of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and delete
 it
  from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Running job issues

2014-08-26 Thread Arpit Agarwal
 And the namenode does not even start: 14/08/26 12:01:09 WARN

 namenode.FSNamesystem: Encountered exception loading fsimage

 java.io.IOException: NameNode is not formatted.

Have you formatted HDFS (step 3.4)?


On Tue, Aug 26, 2014 at 3:08 AM, Blanca Hernandez 
blanca.hernan...@willhaben.at wrote:

  Hi!



 I have just installed hadoop in my windows x64 machine.l followd carefully
 the instructions in https://wiki.apache.org/hadoop/Hadoop2OnWindows but
 in the 3.5 and 3.6 points I have some problems I can not handle.



 %HADOOP_PREFIX%\sbin\start-dfs.cmd





 The datanode can no connect: 14/08/26 12:01:30 WARN datanode.DataNode:
 Problem connecting to server: 0.0.0.0/0.0.0.0:19000

 And the namenode does not even start: 14/08/26 12:01:09 WARN
 namenode.FSNamesystem: Encountered exception loading fsimage
 java.io.IOException: NameNode is not formatted.



 Trying to get running the mapreduce-example indicated in the pont 3.6, I
 get the connection exception again. Tha brings me to the
 http://wiki.apache.org/hadoop/ConnectionRefused page where the exception
 in explained. So I guess I have some misunderstandings with the
 configuration. I tried to find information about that and found the page
 http://wiki.apache.org/hadoop/HowToConfigure but is still not very clear
 for me.



 I attach my config files, the ones I modified maybe you can help me out…



 Many thanks!!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Create HDFS directory fails

2014-07-29 Thread Arpit Agarwal
FileSystem.create creates regular files. The documentation could be clearer
about this).

FileSystem.mkdirs creates directories.


On Tue, Jul 29, 2014 at 11:07 AM, R J rj201...@yahoo.com wrote:

 Thank you.
 I tried all the following but none works:

 FSDataOutputStream out = hdfs.create(new Path(/user/logger/dev2/));
 FSDataOutputStream out = hdfs.create(new Path(/user/logger/dev2));

 Path hdfsFile = new Path(/user/logger/dev2/one.dat);
 FSDataOutputStream out = hdfs.create(hdfsFile);

 Path hdfsFile = new Path(/user/logger/dev2);
 FSDataOutputStream out = hdfs.create(hdfsFile);


 Path hdfsFile = new Path(/user/logger/dev2/);
 FSDataOutputStream out = hdfs.create(hdfsFile);




   On Tuesday, July 29, 2014 1:57 AM, Wellington Chevreuil 
 wellington.chevre...@gmail.com wrote:


 Hum, I'm not sure, but I think through the API, you have to create each
 folder level at a time. For instance, if your current path is
 /user/logger and you want to create /user/logger/dev2/tmp2, you have to
 first do hdfs.create(new Path(/user/logger/dev2)), then hdfs.create(new
 Path(/user/logger/dev2/tmp2)). Have you already tried that?

 On 29 Jul 2014, at 08:43, R J rj201...@yahoo.com wrote:

 Hi All,

 I am trying to programmatically create a directory in HDFS but it fails
 with error.

 This the part of my code:
 Path hdfsFile = new Path(/user/logger/dev2/tmp2);
 try {
 FSDataOutputStream out = hdfs.create(hdfsFile);
 }
 And I get this error:
 java.io.IOException: Mkdirs failed to create /user/logger/dev2/tmp2
 at
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:379)
 at
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:365)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464)
 at PutMerge.main(PutMerge.java:20)

 I can create the same HDFS directory (and then remove) via hadoop command
 as the same user who is running the java executable:
 $hadoop fs -mkdir /user/logger/dev/tmp2
 $hadoop fs -rmr /user/logger/dev/tmp2
 (above works)

 Here is my entire code:
 --PutMerge.java--
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 public class PutMerge {

 public static void main(String[] args) throws IOException {
 Configuration conf = new Configuration();
 FileSystem hdfs = FileSystem.get(conf);
 FileSystem local = FileSystem.getLocal(conf);

 Path inputDir = new Path(/home/tmp/test);
 Path hdfsFile = new Path(/user/logger/dev/tmp2);

 try {
 FileStatus[] inputFiles = local.listStatus(inputDir);
 FSDataOutputStream out = hdfs.create(hdfsFile);

 for (int i=0; iinputFiles.length; i++) {
 System.out.println(inputFiles[i].getPath().getName());
 FSDataInputStream in = local.open(inputFiles[i].getPath());
 byte buffer[] = new byte[256];
 int bytesRead = 0;
 while( (bytesRead = in.read(buffer))  0) {
 out.write(buffer, 0, bytesRead);
 }
 in.close();
 }
 out.close();
 } catch (IOException e) {
 e.printStackTrace();
 }
 }
 }
 --






-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Is hdfs Append to file command ready for production?

2014-07-29 Thread Arpit Agarwal
Most of those Jiras are for the append feature in general and not the
appendToFile CLI.

The append feature used via the FileSystem API is stable in Apache Hadoop
2.2 and later. I added the appendToFile CLI as a convenience and it has not
been tested/tuned for performance so YMMV.


On Tue, Jul 29, 2014 at 9:29 AM, Manikanda Prabhu gmkprabhu1...@gmail.com
wrote:

 Hi,

 We are planning to use one of the hdfs commands appendToFile in our file
 process, would someone confirm it's production ready or any open issues
 thats still in discussion.

 In my research, I found the following JIRA's directly or related to this
 command and its all closed (except HDFS 1060). please let me know if i
 missed anything

   JIRA Id Description Status  HDFS-1060 Append/flush should support
 concurrent tailer use case Open  HADOOP-6239
 HDFS-4905 Command-line for append Fixed  HDFS-744 Support hsync in HDFS
 Fixed  HDFS-222 Support for concatenating of files into a single file
 Fixed  HDFS-265 Revisit append- This jira revisits append, aiming for a
 design and implementation supporting a semantics that are acceptable to its
 users. Fixed  HADOOP-5224 Disable append Fixed  HADOOP-5332 Make support
 for file append API configurable Fixed  HDFS-200 In HDFS, sync() not yet
 guarantees data available to the new readers Fixed   HADOOP-1708 Make
 files visible in the namespace as soon as they are created Fixed
 HADOOP-1700 Append to files in HDFS Fixed
 Regards,
 Mani


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Is hdfs Append to file command ready for production?

2014-07-29 Thread Arpit Agarwal
By tested I meant tested for performance. It is fine functionally.


On Tue, Jul 29, 2014 at 3:04 PM, Arpit Agarwal aagar...@hortonworks.com
wrote:

 Most of those Jiras are for the append feature in general and not the
 appendToFile CLI.

 The append feature used via the FileSystem API is stable in Apache Hadoop
 2.2 and later. I added the appendToFile CLI as a convenience and it has not
 been tested/tuned for performance so YMMV.


 On Tue, Jul 29, 2014 at 9:29 AM, Manikanda Prabhu gmkprabhu1...@gmail.com
  wrote:

 Hi,

 We are planning to use one of the hdfs commands appendToFile in our
 file process, would someone confirm it's production ready or any open
 issues thats still in discussion.

 In my research, I found the following JIRA's directly or related to this
 command and its all closed (except HDFS 1060). please let me know if i
 missed anything

   JIRA Id Description Status  HDFS-1060 Append/flush should support
 concurrent tailer use case Open  HADOOP-6239
 HDFS-4905 Command-line for append Fixed  HDFS-744 Support hsync in HDFS
 Fixed  HDFS-222 Support for concatenating of files into a single file
 Fixed  HDFS-265 Revisit append- This jira revisits append, aiming for a
 design and implementation supporting a semantics that are acceptable to its
 users. Fixed  HADOOP-5224 Disable append Fixed  HADOOP-5332 Make support
 for file append API configurable Fixed  HDFS-200 In HDFS, sync() not yet
 guarantees data available to the new readers Fixed   HADOOP-1708 Make
 files visible in the namespace as soon as they are created Fixed
 HADOOP-1700 Append to files in HDFS Fixed
 Regards,
 Mani




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: how to reduce delay in HDFS restart

2014-07-24 Thread Arpit Agarwal
Which version of Hadoop?

Yes saveNamespace as you described will checkpoint the FsImage and reset
your edits log so it will reduce startup time.


On Thu, Jul 24, 2014 at 8:25 AM, Anfernee Xu anfernee...@gmail.com wrote:

 Yes, I have secondary NM, but without HA, but after I killed NM and
 secondary NM, the startup time was still too long(4 hours), what else
 should I do?  Will hadoop dfsadmin -saveNamespace resolve the issue?

 Thanks for your help.


 On Wed, Jul 23, 2014 at 10:58 PM, Stanley Shi s...@gopivotal.com wrote:

 Do you have a secondary namenode running? Secondary NN is used for this
 purpose;
 Also, if you have HDFS HA enabled, this problem will also not occur.

 Regards,
 *Stanley Shi,*



 On Tue, Jul 22, 2014 at 7:24 AM, Anfernee Xu anfernee...@gmail.com
 wrote:

 Hi,

 For some reason, all PIDs file are missing in my cluster, I have to
 manually kill all java processes on all machines, then I restarted the
 HDFS, but it took so long time in applying changes in edit log file, so my
 question is how can I reduce the delay? My understanding is as follows,
 could someone please give some comments on it?

 hadoop dfsadmin -safemode enter

 #save current in-mem data to image file and reset edit log
 hadoop dfsadmin -saveNamespace


 --
 --Anfernee





 --
 --Anfernee


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: RegionServer many socket fds are in CLOSE_WAIT and not getting cleared

2014-07-24 Thread Arpit Agarwal
+Hbase User


On Wed, Jul 23, 2014 at 11:04 PM, Shankar hiremath 
shankar.hirem...@huawei.com wrote:

  Dear All,

 Observation:
 I have HBase cluster with Kerberos enabled, when the Region Server
 startups then we observed some 20-30 socket fds are in CLOSE_WAIT state,
 After that when the Region Server starts opening then the socket fds in
 CLOSE_WAIT starts increasing gradually (almost same as number of regions
 opened by the region server)
 And all these CLOSE_WAIT states are not getting cleared up,

 /hbase lsof -i | grep `jps | grep RegionServer | cut -d   -f1`
 java18028 shankar1  118u  IPv6 18552894  0t0  TCP
 XX-XX-XX-XX:60020 (LISTEN)
 java18028 shankar1  160u  IPv6 18548520  0t0  TCP *:60030 (LISTEN)
 java18028 shankar1  167u  IPv6 18548522  0t0  TCP
 XX-XX-XX-XX:42534-host-10-18-40-52:eforward (ESTABLISHED)
 java18028 shankar1  172u  IPv6 18552916  0t0  TCP
 XX-XX-XX-XX:42535-host-10-18-40-52:eforward (ESTABLISHED)
 java18028 shankar1  173u  IPv6 18551227  0t0  TCP
 XX-XX-XX-XX:49646-XX-XX-XX-XX:6 (ESTABLISHED)
 java18028 shankar1  178u  IPv6 18551237  0t0  TCP
 XX-XX-XX-XX:62668-XX-XX-XX-XX:busboy (ESTABLISHED)
 java18028 shankar1  185u  IPv6 18548549  0t0  TCP
 XX-XX-XX-XX:21856-host-10-18-40-134:eforward (ESTABLISHED)
 java18028 shankar1  187u  IPv6 18548558  0t0  TCP
 XX-XX-XX-XX:62673-XX-XX-XX-XX:busboy (ESTABLISHED)
 java18028 shankar1  188u  IPv6 18601323  0t0  TCP
 XX-XX-XX-XX:63168-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  189u  IPv6 18601322  0t0  TCP
 XX-XX-XX-XX:63167-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  190u  IPv6 18601324  0t0  TCP
 XX-XX-XX-XX:63169-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  191r  IPv6 18592423  0t0  TCP
 XX-XX-XX-XX:63087-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  193u  IPv6 18593210  0t0  TCP
 XX-XX-XX-XX:63090-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  194u  IPv6 18548560  0t0  TCP
 XX-XX-XX-XX:62675-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  195u  IPv6 18592428  0t0  TCP
 XX-XX-XX-XX:63093-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  196u  IPv6 18593218  0t0  TCP
 XX-XX-XX-XX:63096-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  197u  IPv6 18591423  0t0  TCP
 XX-XX-XX-XX:63105-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  201u  IPv6 18592431  0t0  TCP
 XX-XX-XX-XX:63099-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  202u  IPv6 18592433  0t0  TCP
 XX-XX-XX-XX:63102-XX-XX-XX-XX:busboy (CLOSE_WAIT)
 java18028 shankar1  203u  IPv6 18552317  0t0  TCP
 XX-XX-XX-XX:62681-XX-XX-XX-XX:busboy (CLOSE_WAIT)

 

 

 

 

 

 any input or suggestion will be helpful, or is it a bug.

 Regards
 -Shankar

  This e-mail and its attachments contain confidential information from
 HUAWEI, which is intended only for the person or entity whose address is
 listed above. Any use of the information contained herein in any way
 (including, but not limited to, total or partial disclosure, reproduction,
 or dissemination) by persons other than the intended recipient(s) is
 prohibited. If you receive this e-mail in error, please notify the sender
 by phone or email immediately and delete it!







-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Replace a block with a new one

2014-07-18 Thread Arpit Agarwal
IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
take the perf hit and recreate the file?

If you need to do this regularly you should consider a mutable file store
like HBase. If you start modifying blocks from under HDFS you open up all
sorts of consistency issues.




On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo gsmst...@gmail.com wrote:

 That will break the consistency of the file system, but it doesn't hurt to
 try.
 On Jul 17, 2014 8:48 PM, Zesheng Wu wuzeshen...@gmail.com wrote:

 How about write a new block with new checksum file, and replace the old
 block file and checksum file both?


 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil 
 wellington.chevre...@gmail.com:

 Hi,

 there's no way to do that, as HDFS does not provide file updates
 features. You'll need to write a new file with the changes.

 Notice that even if you manage to find the physical block replica files
 on the disk, corresponding to the part of the file you want to change, you
 can't simply update it manually, as this would give a different checksum,
 making HDFS mark such blocks as corrupt.

 Regards,
 Wellington.



 On 17 Jul 2014, at 10:50, Zesheng Wu wuzeshen...@gmail.com wrote:

  Hi guys,
 
  I recently encounter a scenario which needs to replace an exist block
 with a newly written block
  The most straightforward way to finish may be like this:
  Suppose the original file is A, and we write a new file B which is
 composed by the new data blocks, then we merge A and B to C which is the
 file we wanted
  The obvious shortcoming of this method is wasting of network bandwidth
 
  I'm wondering whether there is a way to replace the old block by the
 new block directly.
  Any thoughts?
 
  --
  Best Wishes!
 
  Yours, Zesheng




 --
 Best Wishes!

 Yours, Zesheng



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: multiple map tasks writing in same hdfs file -issue

2014-07-10 Thread Arpit Agarwal
HDFS is single-writer, multiple-reader (see sec 8.3.1 of
http://aosabook.org/en/hdfs.html). You cannot have multiple writers for a
single file at a time.


On Thu, Jul 10, 2014 at 2:55 AM, rab ra rab...@gmail.com wrote:

 Hello


 I have one use-case that spans multiple map tasks in hadoop environment. I
 use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
 into a file stored in hdfs. This file is shared across all the map tasks.
 Though, they all computes thier output but some of them are missing in the
 output file.



 The output file is an excel file with 8 parameters(headings). Each map
 task is supposed to compute all these 8 values, and save it as soon as it
 is computed. This means, the programming logic of a map task opens the
 file, writes the value and close, 8 times.



 Can someone give me a hint on whats going wrong here?



 Is it possible to make more than one map task to write in a shared file in
 HDFS?

 Regards
 Rab


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Copy hdfs block from one data node to another

2014-07-09 Thread Arpit Agarwal
The balancer does something similar. It uses
DataTransferProtocol.replaceBlock.


On Wed, Jul 9, 2014 at 9:20 PM, sudhakara st sudhakara...@gmail.com wrote:

 You can get info about all blocks stored in perticuler data node, i,e
 block report. But you to handle, move in block level not in file or start
 and end bytes level.


 On Thu, Jul 10, 2014 at 2:49 AM, Chris Mawata chris.maw...@gmail.com
 wrote:

 Haven't looked at the source but the thing you are trying to do sounds
 similar to what happens when you are decommissioning a datanode. I would
 hunt for that code.
 Cheers
 Chris
 On Jul 9, 2014 3:41 PM, Yehia Elshater y.z.elsha...@gmail.com wrote:

 Hi Chris,

 Actually I need this functionality for my research, basically for fault
 tolerance. I can calculate some failure probability for some data nodes
 after certain unit of time. So I need to copy all the blocks reside on
 these nodes to another nodes.

 Thanks
 Yehia


 On 7 July 2014 20:45, Chris Mawata chris.maw...@gmail.com wrote:

 Can you outline why one would want to do that? The blocks are
 disposable so it is strange to manipulate them directly.
  On Jul 7, 2014 8:16 PM, Yehia Elshater y.z.elsha...@gmail.com
 wrote:

 Hi All,

 How can copy a certain hdfs block (given the file name, start and end
 bytes) from one node to another node ?

 Thanks
 Yehia





 --

 Regards,
 ...sudhakara



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Fwd: Trying to build Hadoop on Windows 8

2014-06-13 Thread Arpit Agarwal
Unfortunately the SDK for Windows 8 does not include command-line build
tools. You can build a binary distribution on Windows 7 using the steps
outlined in BUILDING.txt and it should work on Windows 8.

If you must build on Windows 8 you can try a couple of things:
- Developer command prompt via Visual Studio Express 2013, if one is
available.
- Use Windows 7 SDK on Windows 8.

If you do get either of these working updated instructions or patches would
be welcome.


On Fri, Jun 13, 2014 at 9:19 AM, Néstor Boscán nesto...@gmail.com wrote:

 The error I'm getting is building Apache Hadoop Commons:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 The BUILDING.txt documentation explains that I have to run Windows SDK
 Command Prompt, but, in Windows 8 I can't find that shortcut. I tried
 adding the msbuild folder to the PATH but it doesn't work.

 Regards,

 Néstor


 On Fri, Jun 13, 2014 at 11:42 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 And it looks like Hortonworks is only certified with Windows Server, not
 Windows 8.


 On Fri, Jun 13, 2014 at 10:57 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 Thanks but I'm trying to stick with the Hadoop installation.


 On Fri, Jun 13, 2014 at 10:45 AM, Publius t...@yahoo.com wrote:

 maybe just download hortonworks for windows

 Download Hadoop http://hortonworks.com/hdp/downloads/
  [image: image] http://hortonworks.com/hdp/downloads/
  Download Hadoop http://hortonworks.com/hdp/downloads/
 Download Apache Hadoop for the enterprise with Hortonworks Data
 Platform. Data access, storage, governance, security and operations across
 Linux and...
  View on hortonworks.com http://hortonworks.com/hdp/downloads/
  Preview by Yahoo








 KMG 365
   --
  *From:* Néstor Boscán nesto...@gmail.com
 *To:* user@hadoop.apache.org
 *Sent:* Friday, June 13, 2014 7:31 AM
 *Subject:* Fwd: Trying to build Hadoop on Windows 8

 Hi

 I'm trying to build Hadoop on WIndows 8. I have:

 Java 1.6.0_45 (JAVA_HOME set using old DOS name)
 Maven 3.2 (M2_HOME and PATH set)
 Protoc 2.5.0 (The 32-bit build that I found, PATH set)
 Cygwin 64-bit (PATH set)
 Windows SDK

 When I try to run the build I get an error that it cannot build using
 the Windows tools. Lookig throught the Internet I found that with Visual
 Studio 2010 there is a batch file that sets the Windows SDK enviroment
 variables for this but I don't know how to do this if I only use Windows
 SDK.

 Regards,

 Néstor







-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Fwd: Trying to build Hadoop on Windows 8

2014-06-13 Thread Arpit Agarwal
Could you share the complete build output?

Feel free to put it on pastebin or similar if it's a lot of text.


On Fri, Jun 13, 2014 at 12:00 PM, Néstor Boscán nesto...@gmail.com wrote:

 Hi thanks a lot for the info

 I started using a Windows 7 PC and I'm using the Windows SDK Command
 Prompt but I'm still getting the same error:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 I tried with hadoop source 2.2.0, 2.3.0 and 2.4.0 with no success. I
 checked the BUILDINGS.txt and everything is exactly as the file describes.

 Regards,

 Nestor


 On Fri, Jun 13, 2014 at 2:22 PM, Arpit Agarwal aagar...@hortonworks.com
 wrote:

 Unfortunately the SDK for Windows 8 does not include command-line build
 tools. You can build a binary distribution on Windows 7 using the steps
 outlined in BUILDING.txt and it should work on Windows 8.

 If you must build on Windows 8 you can try a couple of things:
 - Developer command prompt via Visual Studio Express 2013, if one is
 available.
 - Use Windows 7 SDK on Windows 8.

 If you do get either of these working updated instructions or patches
 would be welcome.


 On Fri, Jun 13, 2014 at 9:19 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 The error I'm getting is building Apache Hadoop Commons:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 The BUILDING.txt documentation explains that I have to run Windows SDK
 Command Prompt, but, in Windows 8 I can't find that shortcut. I tried
 adding the msbuild folder to the PATH but it doesn't work.

 Regards,

 Néstor


 On Fri, Jun 13, 2014 at 11:42 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 And it looks like Hortonworks is only certified with Windows Server,
 not Windows 8.


 On Fri, Jun 13, 2014 at 10:57 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 Thanks but I'm trying to stick with the Hadoop installation.


 On Fri, Jun 13, 2014 at 10:45 AM, Publius t...@yahoo.com wrote:

 maybe just download hortonworks for windows

 Download Hadoop http://hortonworks.com/hdp/downloads/
  [image: image] http://hortonworks.com/hdp/downloads/
  Download Hadoop http://hortonworks.com/hdp/downloads/
 Download Apache Hadoop for the enterprise with Hortonworks Data
 Platform. Data access, storage, governance, security and operations 
 across
 Linux and...
  View on hortonworks.com http://hortonworks.com/hdp/downloads/
  Preview by Yahoo








 KMG 365
   --
  *From:* Néstor Boscán nesto...@gmail.com
 *To:* user@hadoop.apache.org
 *Sent:* Friday, June 13, 2014 7:31 AM
 *Subject:* Fwd: Trying to build Hadoop on Windows 8

 Hi

 I'm trying to build Hadoop on WIndows 8. I have:

 Java 1.6.0_45 (JAVA_HOME set using old DOS name)
 Maven 3.2 (M2_HOME and PATH set)
 Protoc 2.5.0 (The 32-bit build that I found, PATH set)
 Cygwin 64-bit (PATH set)
 Windows SDK

 When I try to run the build I get an error that it cannot build using
 the Windows tools. Lookig throught the Internet I found that with Visual
 Studio 2010 there is a batch file that sets the Windows SDK enviroment
 variables for this but I don't know how to do this if I only use Windows
 SDK.

 Regards,

 Néstor







 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Fwd: Trying to build Hadoop on Windows 8

2014-06-13 Thread Arpit Agarwal
Also in case you are using Cygwin, please don't! You should be building
from the Windows SDK command prompt.

See if this helps - https://wiki.apache.org/hadoop/Hadoop2OnWindows



On Fri, Jun 13, 2014 at 12:06 PM, Arpit Agarwal aagar...@hortonworks.com
wrote:

 Could you share the complete build output?

 Feel free to put it on pastebin or similar if it's a lot of text.


 On Fri, Jun 13, 2014 at 12:00 PM, Néstor Boscán nesto...@gmail.com
 wrote:

 Hi thanks a lot for the info

 I started using a Windows 7 PC and I'm using the Windows SDK Command
 Prompt but I'm still getting the same error:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 I tried with hadoop source 2.2.0, 2.3.0 and 2.4.0 with no success. I
 checked the BUILDINGS.txt and everything is exactly as the file describes.

 Regards,

 Nestor


 On Fri, Jun 13, 2014 at 2:22 PM, Arpit Agarwal aagar...@hortonworks.com
 wrote:

 Unfortunately the SDK for Windows 8 does not include command-line build
 tools. You can build a binary distribution on Windows 7 using the steps
 outlined in BUILDING.txt and it should work on Windows 8.

 If you must build on Windows 8 you can try a couple of things:
 - Developer command prompt via Visual Studio Express 2013, if one is
 available.
 - Use Windows 7 SDK on Windows 8.

 If you do get either of these working updated instructions or patches
 would be welcome.


 On Fri, Jun 13, 2014 at 9:19 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 The error I'm getting is building Apache Hadoop Commons:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 The BUILDING.txt documentation explains that I have to run Windows SDK
 Command Prompt, but, in Windows 8 I can't find that shortcut. I tried
 adding the msbuild folder to the PATH but it doesn't work.

 Regards,

 Néstor


 On Fri, Jun 13, 2014 at 11:42 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 And it looks like Hortonworks is only certified with Windows Server,
 not Windows 8.


 On Fri, Jun 13, 2014 at 10:57 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 Thanks but I'm trying to stick with the Hadoop installation.


 On Fri, Jun 13, 2014 at 10:45 AM, Publius t...@yahoo.com wrote:

 maybe just download hortonworks for windows

 Download Hadoop http://hortonworks.com/hdp/downloads/
  [image: image] http://hortonworks.com/hdp/downloads/
  Download Hadoop http://hortonworks.com/hdp/downloads/
 Download Apache Hadoop for the enterprise with Hortonworks Data
 Platform. Data access, storage, governance, security and operations 
 across
 Linux and...
  View on hortonworks.com http://hortonworks.com/hdp/downloads/
  Preview by Yahoo








 KMG 365
   --
  *From:* Néstor Boscán nesto...@gmail.com
 *To:* user@hadoop.apache.org
 *Sent:* Friday, June 13, 2014 7:31 AM
 *Subject:* Fwd: Trying to build Hadoop on Windows 8

 Hi

 I'm trying to build Hadoop on WIndows 8. I have:

 Java 1.6.0_45 (JAVA_HOME set using old DOS name)
 Maven 3.2 (M2_HOME and PATH set)
 Protoc 2.5.0 (The 32-bit build that I found, PATH set)
 Cygwin 64-bit (PATH set)
 Windows SDK

 When I try to run the build I get an error that it cannot build
 using the Windows tools. Lookig throught the Internet I found that with
 Visual Studio 2010 there is a batch file that sets the Windows SDK
 enviroment variables for this but I don't know how to do this if I only 
 use
 Windows SDK.

 Regards,

 Néstor







 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Fwd: Trying to build Hadoop on Windows 8

2014-06-13 Thread Arpit Agarwal
 .. SKIPPED
 [INFO] hadoop-yarn-applications-distributedshell . SKIPPED
 [INFO] hadoop-yarn-applications-unmanaged-am-launcher  SKIPPED
 [INFO] hadoop-yarn-site .. SKIPPED
 [INFO] hadoop-yarn-project ... SKIPPED
 [INFO] hadoop-mapreduce-client ... SKIPPED
 [INFO] hadoop-mapreduce-client-core .. SKIPPED
 [INFO] hadoop-mapreduce-client-common  SKIPPED
 [INFO] hadoop-mapreduce-client-shuffle ... SKIPPED
 [INFO] hadoop-mapreduce-client-app ... SKIPPED
 [INFO] hadoop-mapreduce-client-hs  SKIPPED
 [INFO] hadoop-mapreduce-client-jobclient . SKIPPED
 [INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
 [INFO] Apache Hadoop MapReduce Examples .. SKIPPED
 [INFO] hadoop-mapreduce .. SKIPPED
 [INFO] Apache Hadoop MapReduce Streaming . SKIPPED
 [INFO] Apache Hadoop Distributed Copy  SKIPPED
 [INFO] Apache Hadoop Archives  SKIPPED
 [INFO] Apache Hadoop Rumen ... SKIPPED
 [INFO] Apache Hadoop Gridmix . SKIPPED
 [INFO] Apache Hadoop Data Join ... SKIPPED
 [INFO] Apache Hadoop Extras .. SKIPPED
 [INFO] Apache Hadoop Pipes ... SKIPPED
 [INFO] Apache Hadoop OpenStack support ... SKIPPED
 [INFO] Apache Hadoop Client .. SKIPPED
 [INFO] Apache Hadoop Mini-Cluster  SKIPPED
 [INFO] Apache Hadoop Scheduler Load Simulator  SKIPPED
 [INFO] Apache Hadoop Tools Dist .. SKIPPED
 [INFO] Apache Hadoop Tools ... SKIPPED
 [INFO] Apache Hadoop Distribution  SKIPPED
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 21.678s
 [INFO] Finished at: Fri Jun 13 14:39:14 VET 2014
 [INFO] Final Memory: 57M/498M
 [INFO]
 

 Regards,

 Nestor


 On Fri, Jun 13, 2014 at 2:36 PM, Arpit Agarwal aagar...@hortonworks.com
 wrote:

 Could you share the complete build output?

 Feel free to put it on pastebin or similar if it's a lot of text.


 On Fri, Jun 13, 2014 at 12:00 PM, Néstor Boscán nesto...@gmail.com
 wrote:

 Hi thanks a lot for the info

 I started using a Windows 7 PC and I'm using the Windows SDK Command
 Prompt but I'm still getting the same error:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 I tried with hadoop source 2.2.0, 2.3.0 and 2.4.0 with no success. I
 checked the BUILDINGS.txt and everything is exactly as the file describes.

 Regards,

 Nestor


 On Fri, Jun 13, 2014 at 2:22 PM, Arpit Agarwal 
 aagar...@hortonworks.com wrote:

 Unfortunately the SDK for Windows 8 does not include command-line
 build tools. You can build a binary distribution on Windows 7 using the
 steps outlined in BUILDING.txt and it should work on Windows 8.

 If you must build on Windows 8 you can try a couple of things:
 - Developer command prompt via Visual Studio Express 2013, if one is
 available.
 - Use Windows 7 SDK on Windows 8.

 If you do get either of these working updated instructions or patches
 would be welcome.


 On Fri, Jun 13, 2014 at 9:19 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 The error I'm getting is building Apache Hadoop Commons:

 [ERROR] Failed to execute goal
 org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on
 project hadoop-common: Command execution failed. Process exited with an
 error: 1(Exit value: 1) - [Help 1]

 The BUILDING.txt documentation explains that I have to run Windows
 SDK Command Prompt, but, in Windows 8 I can't find that shortcut. I tried
 adding the msbuild folder to the PATH but it doesn't work.

 Regards,

 Néstor


 On Fri, Jun 13, 2014 at 11:42 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 And it looks like Hortonworks is only certified with Windows Server,
 not Windows 8.


 On Fri, Jun 13, 2014 at 10:57 AM, Néstor Boscán nesto...@gmail.com
 wrote:

 Thanks but I'm trying to stick with the Hadoop installation.


 On Fri, Jun 13, 2014 at 10:45 AM, Publius t...@yahoo.com wrote:

 maybe just download hortonworks for windows

 Download Hadoop http://hortonworks.com/hdp/downloads/
  [image: image] http://hortonworks.com/hdp/downloads/
  Download Hadoop http://hortonworks.com/hdp/downloads/
 Download Apache Hadoop for the enterprise with Hortonworks Data
 Platform. Data access, storage

Re: Fwd: Trying to build Hadoop on Windows 8

2014-06-13 Thread Arpit Agarwal
No idea. Can you just try to update the environment in the cmd window to
point to the latest .NET and rebuild?


On Fri, Jun 13, 2014 at 2:12 PM, Néstor Boscán nesto...@gmail.com wrote:

 Thanks for the help

 I have the latest .NET Framework installed. Checking the enviroment
 variables Windows SDK is pointing to 3.5. I'm not very knowledgable of
 Windows Development. Is there a way to point to .Net 4?

 Regards,

 Nestor


 On Fri, Jun 13, 2014 at 4:26 PM, Arpit Agarwal aagar...@hortonworks.com
 wrote:

 Google search for the error message shows this.


 https://stackoverflow.com/questions/5107757/using-msbuild-with-vs2010-generated-vcxproj-file-as-target-error-msb4066-at

 Tried updating .NET?


 On Fri, Jun 13, 2014 at 12:21 PM, Néstor Boscán nesto...@gmail.com
 wrote:

 Tried to build using 32bit same error.

 Regards,

 Nestor


 On Fri, Jun 13, 2014 at 2:41 PM, Néstor Boscán nesto...@gmail.com
 wrote:

 Hi Arpit

 Yes I'm using the Windows SDK Command Prompt. I only use cygwin for the
 dependency with make, ssh, etc.

 Here's the Haddop Common compilation:

 [INFO]
 
 [INFO] Building Apache Hadoop Common 2.4.0
 [INFO]
 
 [INFO]
 [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-os) @
 hadoop-common ---
 [INFO]
 [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @
 hadoop-common ---
 [INFO] Executing tasks

 main:
 [INFO] Executed tasks
 [INFO]
 [INFO] --- hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @
 hadoop-common ---
 [INFO]
 [INFO] --- hadoop-maven-plugins:2.4.0:version-info (version-info) @
 hadoop-common ---
 [WARNING] [svn, info] failed with error code 1
 [WARNING] [git, branch] failed: java.io.IOException: Cannot run program
 git: CreateProcess error=2, The system cannot find the file specified
 [INFO] SCM: NONE
 [INFO] Computed MD5: 375b2832a6641759c6eaf6e3e998147
 [INFO]
 [INFO] --- maven-resources-plugin:2.2:resources (default-resources) @
 hadoop-common ---
 [INFO] Using default encoding to copy filtered resources.
 [INFO]
 [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @
 hadoop-common ---
 [INFO] Compiling 13 source files to
 C:\Desarrollo\hadoop\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\target\classes
 [INFO]
 [INFO] --- native-maven-plugin:1.0-alpha-7:javah (default) @
 hadoop-common ---
 [INFO] cmd.exe /X /C C:\PROGRA~1\Java\jdk1.6.0_45\bin\javah -d
 C:\Desarrollo\hadoop\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\target\native\javah
 -classpath
 C:\Desarrollo\hadoop\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\target\classes;C:\Desarrollo\hadoop\hadoop-2.4.0-src\hadoop-common-project\hadoop-annotations\target\hadoop-annotations-2.4.0.jar;C:\PROGRA~1\Java\jdk1.6.0_45\jre\..\lib\tools.jar;C:\Desarrollo\maven\repositorio\com\google\guava\guava\11.0.2\guava-11.0.2.jar;C:\Desarrollo\maven\repositorio\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Desarrollo\maven\repositorio\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\Desarrollo\maven\repositorio\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Desarrollo\maven\repositorio\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Desarrollo\maven\repositorio\commons-codec\commons-codec\1.4\commons-codec-1.4.jar;C:\Desarrollo\maven\repositorio\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Desarrollo\maven\repositorio\commons-net\commons-net\3.1\commons-net-3.1.jar;C:\Desarrollo\maven\repositorio\commons-collections\commons-collections\3.2.1\commons-collections-3.2.1.jar;C:\Desarrollo\maven\repositorio\javax\servlet\servlet-api\2.5\servlet-api-2.5.jar;C:\Desarrollo\maven\repositorio\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;C:\Desarrollo\maven\repositorio\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Desarrollo\maven\repositorio\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;C:\Desarrollo\maven\repositorio\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;C:\Desarrollo\maven\repositorio\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;C:\Desarrollo\maven\repositorio\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;C:\Desarrollo\maven\repositorio\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;C:\Desarrollo\maven\repositorio\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;C:\Desarrollo\maven\repositorio\javax\activation\activation\1.1\activation-1.1.jar;C:\Desarrollo\maven\repositorio\org\codehaus\jackson\jackson-jaxrs\1.8.8\jackson-jaxrs-1.8.8.jar;C:\Desarrollo\maven\repositorio\org\codehaus\jackson\jackson-xc\1.8.8\jackson-xc-1.8.8.jar;C:\Desarrollo\maven\repositorio\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;C:\Desarrollo\maven\repositorio\asm\asm\3.2\asm-3.2.jar;C:\Desarrollo\maven\repositorio\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\Desarrollo\maven\repositorio\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Desarrollo\maven

Re: Building hadoop 2.2.1 from source code

2014-06-12 Thread Arpit Agarwal
You may need to add the 'install' target the first time you build (and
every time you build clean thereafter).

Your 'java -version' and 'mvn -version' report different versions of Java.
Check your JAVA_HOME.





On Thu, Jun 12, 2014 at 9:49 AM, Ted Yu yuzhih...@gmail.com wrote:

 Can you run the following command first ?

 mvn clean package -DskipTests

 Here is the version of maven I use:

  mvn -version
 Apache Maven 3.1.1 (0728685237757ffbf44136acec0402957f723d9a; 2013-09-17
 15:22:22+)

 Cheers


 On Thu, Jun 12, 2014 at 9:37 AM, Lukas Drbal lukas.dr...@socialbakers.com
  wrote:

 Hi all,

 i have a problem with build hadoop from source code. I take git from
 https://github.com/apache/hadoop-common and branch-2.2.1 and try
 mvn package -Pdist,native -DskipTests -Dtar but it return a lot errors.

 Here is log from mvn
 https://gist.github.com/anonymous/052b6d45f64be01dab43


 My enviroment:

 lestr@drbal:~/data/git/hadoop-common [ branch-2.2.1 ] ∑ java -version
 java version 1.7.0_60
 Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
 Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
 lestr@drbal:~/data/git/hadoop-common [ branch-2.2.1 ] ∑ mvn -version
 Apache Maven 3.0.5
 Maven home: /usr/share/maven
 Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
 Java home: /usr/lib/jvm/java-6-oracle/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 3.14-1-amd64, arch: amd64, family: unix
 lestr@drbal:~/data/git/hadoop-common [ branch-2.2.1 ] ∑


 I need build native libs for 64bit. Can somebody help me please?

 Thanks.
 --


 * Lukáš Drbal*
 Software architect

 *Socialbakers*
 Facebook applications and other sweet stuff

 Facebook Preferred Marketing Developer


 *+420 739 815 424 **lukas.dr...@socialbakers.com
 lukas.dr...@socialbakers.com*
 *www.socialbakers.com http://www.socialbakers.com/*




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Questions from a newbie to Hadoop

2014-02-21 Thread Arpit Agarwal
You can try building Apache Hadoop with these instructions:
https://wiki.apache.org/hadoop/Hadoop2OnWindows

32-bit Windows has not been tested.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: How to submit the patch MAPREDUCE-4490.patch which works for branch-1.2, not trunk?

2014-02-17 Thread Arpit Agarwal
I have not looked at branch-1 in a while. test-patch.sh appears to be
finicky there. An alternative is to run the full unit test suite locally
and make sure there are no regressions.



On Sun, Feb 16, 2014 at 12:54 AM, sam liu samliuhad...@gmail.com wrote:

 Hi Arpit,

 Thanks for your guide!  As a new contributor, I still have following two
 questions need your help:

 1) So I guess I should run 'test-patch.sh' on my local environment against
 branch-1.2, not on Apache Hadoop test server, right?
 2) On branch-1.2, I found the 'test-patch.sh' is on
 ./src/test/bin/test-patch.sh, not ./dev-support/test-patch.sh. My command
 is 'sh ./src/test/bin/test-patch.sh MAPREDUCE-4490.patch', however failed
 with message 'ERROR: usage ./src/test/bin/test-patch.sh HUDSON [args] |
 DEVELOPER [args]'. What's the correct way to manually run 'test-patch.sh'?




 2014-02-15 5:25 GMT+08:00 Arpit Agarwal aagar...@hortonworks.com:

 Hi Sam,

 Hadoop Jenkins does not accept patches for 1.x.

 You can manually run 'test-patch.sh' to verify there are no regressions
 introduced by your patch and copy-paste the results into a Jira comment.


 On Thu, Feb 13, 2014 at 10:50 PM, sam liu samliuhad...@gmail.com wrote:

 Hi Experts,

 I have been working on the JIRA
 https://issues.apache.org/jira/browse/MAPREDUCE-4490 and attached
 MAPREDUCE-4490.patch which could fix this jira. I would like to contribute
 my patch to community, but encountered some issues.

 MAPREDUCE-4490 is an issue on Hadoop-1.x versions, and my patch based on
 the  latest code of origin/branch-1.2. However, current trunk bases on Yarn
 and does not has such issue any more. So my patch could not be applied on
 current trunk code, and it's actually no need to generate a similar patch
 on trunk at all.

 How to submit the patch MAPREDUCE-4490.patch only to origin/branch-1.2,
 not trunk? Is it allowed by Apache Hadoop?

 Thanks!



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: How to submit the patch MAPREDUCE-4490.patch which works for branch-1.2, not trunk?

2014-02-14 Thread Arpit Agarwal
Hi Sam,

Hadoop Jenkins does not accept patches for 1.x.

You can manually run 'test-patch.sh' to verify there are no regressions
introduced by your patch and copy-paste the results into a Jira comment.


On Thu, Feb 13, 2014 at 10:50 PM, sam liu samliuhad...@gmail.com wrote:

 Hi Experts,

 I have been working on the JIRA
 https://issues.apache.org/jira/browse/MAPREDUCE-4490 and attached
 MAPREDUCE-4490.patch which could fix this jira. I would like to contribute
 my patch to community, but encountered some issues.

 MAPREDUCE-4490 is an issue on Hadoop-1.x versions, and my patch based on
 the  latest code of origin/branch-1.2. However, current trunk bases on Yarn
 and does not has such issue any more. So my patch could not be applied on
 current trunk code, and it's actually no need to generate a similar patch
 on trunk at all.

 How to submit the patch MAPREDUCE-4490.patch only to origin/branch-1.2,
 not trunk? Is it allowed by Apache Hadoop?

 Thanks!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Fwd: umount bad disk

2014-02-13 Thread Arpit Agarwal
bcc'ed hadoop-user

Lei, perhaps hbase-user can help.

-- Forwarded message --
From: lei liu liulei...@gmail.com
Date: Thu, Feb 13, 2014 at 1:04 AM
Subject: umount bad disk
To: user@hadoop.apache.org


I use HBase0.96 and CDH4.3.1.

I use Short-Circuit Local Read:

property
  namedfs.client.read.shortcircuit/name
  valuetrue/value/propertyproperty
  namedfs.domain.socket.path/name
  value/home/hadoop/cdh4-dn-socket/dn_socket/value/property

When one disk is bad, because the RegionServer open some file on the
disk, so I don't run umount, example:
sudo umount -f /disk10
umount2: Device or resource busy

umount: /disk10: device is busy
umount2: Device or resource busy
umount: /disk10: device is busy

I must stop RegionServer in order to run umount command.


How can don't stop RegionServer and delete the bad disk.

Thanks,

LiuLei

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Wiki Editing

2014-02-11 Thread Arpit Agarwal
+common-dev, bcc user


Hi Steve,


*I'm wondering if someone wouldn't mind adding my user to the list so I can
 add my (small) contribution to the project.*


A wiki admin should be able to do this for you (a few of them are on this
mailing list). Feel free to send a reminder to the list if no one has added
you in a day or so.


* Additionally, I'd like to help update the maven site documentation to add
 some clarity, but I know I'll have to look into how to get going on that
 side of the street.  Correct me if I'm wrong, but the process there would
 be to submit bugs with a patch into Jira, and there is probably a utility
 somewhere that I can run which will ensure that whatever changes I propose
 meet the project standards.*


Documentation patches are always welcome. There is a test-patch.sh script
in the source tree which can be used to validate your patch.


Alternatively if you generate your patch against trunk you can cheat and
click 'Submit Patch' in the Jira to have Jenkins validate the patch for
you. To build and stage the site locally you can run something like mvn
site:stage -DstagingDirectory=/tmp/myhadoopsite. This is useful to
manually verify the formatting looks as expected.


Thanks,
Arpit



On Tue, Feb 11, 2014 at 6:01 AM, One Box one...@tabtonic.com wrote:

 I wanted to contribute to the Wiki tonight, but once I created an account
 it shows that all of the pages are immutable.



 I never did receive an email confirmation, but it did allow me to log in.



 After reading through some of the help documentation, I saw that with some
 ASF projects you have to be added to a list of Wiki Editors manually in
 order to prevent spam.



 I'm wondering if someone wouldn't mind adding my user to the list so I can
 add my (small) contribution to the project.



 My login name is SteveKallestad.



 There is a page that spells out instructions for building from source on
 Windows.  I struggled a bit building  on Ubuntu.  I documented the process
 and I'd like to add it.



 Additionally, I'd like to help update the maven site documentation to add
 some clarity, but I know I'll have to look into how to get going on that
 side of the street.  Correct me if I'm wrong, but the process there would
 be to submit bugs with a patch into Jira, and there is probably a utility
 somewhere that I can run which will ensure that whatever changes I propose
 meet the project standards.



 Any help to get me going is appreciated.



 Thanks,

 Steve


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDFS buffer sizes

2014-01-27 Thread Arpit Agarwal
Looks like DistributedFileSystem ignores it though.


On Sat, Jan 25, 2014 at 6:09 AM, John Lilley john.lil...@redpoint.netwrote:

  There is this in FileSystem.java, which would appear to use the default
 buffer size of 4096 in the create() call unless otherwise specified in
 *io.file.buffer.size*



   public FSDataOutputStream create(Path f, short replication,

   Progressable progress) throws IOException {

 return create(f, true,

   getConf().getInt(


 CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_KEY,


 
 CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_DEFAULT),

   replication,

   getDefaultBlockSize(f), progress);

   }



 But this discussion is missing the point; I really want to know, is there
 any benefit to setting a larger bufferSize in FileSystem.create() and
 FileSystem.append()?



 *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com]
 *Sent:* Friday, January 24, 2014 9:35 AM

 *To:* user@hadoop.apache.org
 *Subject:* Re: HDFS buffer sizes



 I don't think that value is used either except in the legacy block reader
 which is turned off by default.



 On Fri, Jan 24, 2014 at 6:34 AM, John Lilley john.lil...@redpoint.net
 wrote:

 Ah, I see… it is a constant

 CommonConfigurationKeysPublic.java:  public static final int
 IO_FILE_BUFFER_SIZE_DEFAULT = 4096;

 Are there benefits to increasing this for large reads or writes?

 john



 *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com]
 *Sent:* Thursday, January 23, 2014 3:31 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: HDFS buffer sizes



 HDFS does not appear to use dfs.stream-buffer-size.



 On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.net
 wrote:

 What is the interaction between dfs.stream-buffer-size and
 dfs.client-write-packet-size?

 I see that the default for dfs.stream-buffer-size is 4K.  Does anyone have
 experience using larger buffers to optimize large writes?

 Thanks


 John






 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDFS buffer sizes

2014-01-24 Thread Arpit Agarwal
I don't think that value is used either except in the legacy block reader
which is turned off by default.


On Fri, Jan 24, 2014 at 6:34 AM, John Lilley john.lil...@redpoint.netwrote:

  Ah, I see… it is a constant

 CommonConfigurationKeysPublic.java:  public static final int
 IO_FILE_BUFFER_SIZE_DEFAULT = 4096;

 Are there benefits to increasing this for large reads or writes?

 john



 *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com]
 *Sent:* Thursday, January 23, 2014 3:31 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: HDFS buffer sizes



 HDFS does not appear to use dfs.stream-buffer-size.



 On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.net
 wrote:

 What is the interaction between dfs.stream-buffer-size and
 dfs.client-write-packet-size?

 I see that the default for dfs.stream-buffer-size is 4K.  Does anyone have
 experience using larger buffers to optimize large writes?

 Thanks


 John






 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDFS buffer sizes

2014-01-23 Thread Arpit Agarwal
HDFS does not appear to use dfs.stream-buffer-size.


On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.netwrote:

  What is the interaction between dfs.stream-buffer-size and
 dfs.client-write-packet-size?

 I see that the default for dfs.stream-buffer-size is 4K.  Does anyone have
 experience using larger buffers to optimize large writes?

 Thanks


 John




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop 1.2.1 or 2.2.0 on Windows - XP-SP2 Not using Cygwin

2014-01-21 Thread Arpit Agarwal
Anand,

Instructions to build Hadoop 2.2 on Windows are at
https://wiki.apache.org/hadoop/Hadoop2OnWindows

Chuck Lam's book is great but out of date wrt Windows support. Windows XP
is not a supported platform. Windows Server 2008 or later is recommended
and Windows Vista is also likely to work.



On Sun, Jan 19, 2014 at 11:10 PM, Anand Murali anand_vi...@yahoo.comwrote:

 Dear All:

 I have been on a discovery cum learning process for the last 2 months,
 trying to install and use the above Hadoop packages using Cygwin on the
 Windows XP platform, after reading a text book (Hadoop in action - Chuck
 Lam), where he has suggested using Cygwin to work with a Unix like
 environment.

 Much to my dismay, I have come across many installation and runtime
 problems with both Cygwin and Hadoop and things have been very unstable. I
 posted this issue on the issue tracker website and was told that Cygwin is
 not supported on both Hadoop releases, however, I could build for a windows
 environment. I am not sure of how it is done and need assistance and
 advise. I shall be thankful for a response and direction. Look forward to
 an early reply.

 Thanks,

  Anand Murali
 11/7, 'Anand Vihar', Kandasamy St, Mylapore
 Chennai - 600 004, India
 Ph: (044)- 28474593/ 43526162 (voicemail)


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Building Hadoop 2.2.0 On Windows 7 64-bit

2014-01-21 Thread Arpit Agarwal
Folks, please refer to the wiki page
https://wiki.apache.org/hadoop/Hadoop2OnWindows and also BUILDING.txt in
the source tree. We believe we captured all the prerequisites in
BUILDING.txt so let us know if anything is missing.



On Fri, Jan 17, 2014 at 8:16 AM, Steve Lewis lordjoe2...@gmail.com wrote:

 At lease for development work I find that replacing two classes in the
 Hadoop jar (say putting the following code ahead of the hadoop jars in a
 project fixes most windows issues - at least in my hands


 On Fri, Jan 17, 2014 at 6:41 AM, Silvina Caíno Lores 
 silvi.ca...@gmail.com wrote:

 Hey again,

 I'm not a Windows user so I'm not very familiar with these issues.
 However, I recall this 
 linkhttp://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-osas
  a useful source for installation problems I've had on my own, since it's
 for Windows it might help you even further.

 The error

 stdint.h: No such file or directory

  is causing your build to fail, it seems like you don't have the headers
 installed or they aren't properly referenced. Sorry that I can't be of more
 help, I'm not sure how MinGW handles these includes.

 Good luck :D



 On 17 January 2014 15:18, Jian Feng JF She sheji...@cn.ibm.com wrote:

 I have the same the environment,Window 7(64bit) hadoop 2.2 ,yes I have
 installed the protocbuf previously,and according to the guide, put
 protoc.exe and

 libprotobuf.lib,libprotobuf-lite.lib,libprotoc.lib into PATH.

 run protoc --version will get output libprotoc 2.5.0

 Now it seems everything ready, but when run mvn install -DskipTests -e
 will get an error message:

 can not execute: compile-ms-native-dll in
 ..\hadoop-common-project\hadoop-common\pom.xml

 do you have any suggestions?

 Thanks.

 Nikshe


 [image: Inactive hide details for Silvina Caíno Lores ---01/17/2014
 06:43:00 PM---'protoc --version' did not return a version Are you s]Silvina
 Caíno Lores ---01/17/2014 06:43:00 PM---'protoc --version' did not return a
 version Are you sure that you have Protocol Buffers installed?

 From: Silvina Caíno Lores silvi.ca...@gmail.com
 To: user@hadoop.apache.org,
 Date: 01/17/2014 06:43 PM

 Subject: Re: Building Hadoop 2.2.0 On Windows 7 64-bit
 --



 'protoc --version' did not return a version

 Are you sure that you have Protocol Buffers installed?



 On 17 January 2014 11:29, Nirmal Kumar 
 *nirmal.ku...@impetus.co.in*nirmal.ku...@impetus.co.in
 wrote:

Hi All,



I am trying to build Hadoop 2.2.0 On Windows 7 64-bit env.



Can you let me know what else is needed for building Hadoop 2.2.0 On
Windows platform?



I am getting the following error building **hadoop-common** project:



[INFO]


[INFO] Building Apache Hadoop Common 2.2.0

[INFO]


[INFO]

[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @
hadoop-common ---

[INFO] Deleting
D:\YARN_Setup\hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target

[INFO]

[INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @
hadoop-common ---

[INFO] Executing tasks



main:

[mkdir] Created dir:

 D:\YARN_Setup\hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\test-dir

[mkdir] Created dir:

 D:\YARN_Setup\hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\test\data

[INFO] Executed tasks

[INFO]

[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-os) @
hadoop-common ---

[INFO]

[INFO] --- hadoop-maven-plugins:2.2.0:protoc (compile-protoc) @
hadoop-common ---

[WARNING] [protoc, --version] failed: java.io.IOException: Cannot
run program protoc: CreateProcess error=2, The system cannot find the
file specified

[ERROR] stdout: []

[INFO]


[INFO] BUILD FAILURE

[INFO]


[INFO] Total time: 1.153s

[INFO] Finished at: Fri Jan 17 15:55:10 IST 2014

[INFO] Final Memory: 7M/18M

[INFO]


[ERROR] Failed to execute goal
org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc (compile-protoc) on
project hadoop-common: org.apache.maven.plugin.MojoExecutionException:
'protoc --version' did not return a version - [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with
the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug
logging.

[ERROR]

[ERROR] For more information about the errors and possible
solutions, please read the following articles:

[ERROR] [Help 1]

 

Re: Windows - Separating etc (config) from bin

2013-12-02 Thread Arpit Agarwal
Ian,

This sounds like a good idea. Please feel free to file a Jira for it.

Arpit


On Fri, Nov 22, 2013 at 10:20 AM, Ian Jackson 
ian_jack...@trilliumsoftware.com wrote:

  It would be nice if HADOOP_CONF_DIR could be set in the environment like
 YARN_CONF_DIR. This could be done in lib-exec\hadoop_config.cmd by setting
 HADOOP_CONF_DIR conditionally.

 if not defined HADOOP_CONF_DIR (

 set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop

 )



 A similar change might be done in hadoop-config.sh



 Thus the bin could be under Program Files, and Program File be locked down
 from modification, but the configuration files still would be in a separate
 director.



 (--config didn’t seem to work for namenode)


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: understanding souce code structure

2013-05-27 Thread Arpit Agarwal
It can be overwhelming to jump into the HDFS code. Have you read the
architectural
overview of HDFS https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html?

I found it easiest to start with the DFSClient interface which encapsulates
client operations.

The DFSClient communicates with the namenode using ClientProtocol.
The server side of the ClientProtocol handling is in NameNodeRpcServer.
Client communication with the DataNode is encapsulated in
DataTransferProtocol.

Feel free to ask more specific questions if you get stuck.

-Arpit


On Mon, May 27, 2013 at 11:22 AM, Jay Vyas jayunit...@gmail.com wrote:

 Hi!  a few weeks ago I had the same question... Tried a first iteration at
 documenting this by going through the classes starting with key/value pairs
 in the blog post below.


 http://jayunit100.blogspot.com/2013/04/the-kv-pair-salmon-run-in-mapreduce-hdfs.html

 Note it's not perfect yet but I think it should provide some insight into
 things.  The lynch pin of it all is the DFSOutputStream and the
 DataStreamer classes.   Anyways... Feel free to borrow the contents and
 roll your own , or comment on it  leave some feedback,or let me know if
 anything is missing.

 Definetly would be awesome to have a rock solid view of the full write
 path.

 On May 27, 2013, at 2:10 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote:

 Hello

 I am trying to understand the source of of hadoop especially the HDFS. I
 want to know where should I look exactly in the source code about how HDFS
 distributes the data. Also how the map reduce engine tries to read the
 data.


 Any hint regarding the location of those in the source code is appreciated.

 Regards,
 Mahmood*
 *




Re: Is FileSystem thread-safe?

2013-04-01 Thread Arpit Agarwal
Hi John,

DistributedFileSystem is intended to be thread-safe, true to its name.

Metadata operations are handled by the NameNode server which synchronizes
concurrent client requests via locks (you can look at the FSNameSystem
class).

Some discussion on the thread-safety aspects of HDFS:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

-Arpit


On Sun, Mar 31, 2013 at 11:52 AM, Ted Yu yuzhih...@gmail.com wrote:

 If you look at DistributedFileSystem source code, you would see that it
 calls the DFSClient field member for most of the actions.
 Requests to Namenode are then made through ClientProtocol.

 An hdfs committer would be able to give you affirmative answer.


 On Sun, Mar 31, 2013 at 11:27 AM, John Lilley john.lil...@redpoint.netwrote:

  *From:* Ted Yu [mailto:yuzhih...@gmail.com]
 *Subject:* Re: Is FileSystem thread-safe?

 FileSystem is an abstract class, what concrete class are you using
 (DistributedFileSystem, etc) ? 

 Good point.  I am calling FileSystem.get(URI uri, Configuration conf)
 with an URI like “hdfs://server:port/…” on a remote server, so I assume it
 is creating a DistributedFileSystem.  However I am not finding any
 documentation discussing its thread-safety (or lack thereof), perhaps you
 can point me to it?

 Thanks, john





Re: protect from accidental deletes

2013-04-01 Thread Arpit Agarwal
Artem,

In addition, file system snapshots are work in progress.
https://issues.apache.org/jira/browse/HDFS-2802

-Arpit

On Mon, Apr 1, 2013 at 2:15 PM, Mohammad Tariq donta...@gmail.com wrote:

 You cuold also do a distcp for backing up your hdfs.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Tue, Apr 2, 2013 at 2:43 AM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Artem,

   Make sure your /trash is working fine.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Tue, Apr 2, 2013 at 2:41 AM, Artem Ervits are9...@nyp.org wrote:

  Hello all,

 ** **

 I’d like to know what users are doing to protect themselves from
 accidental deletes of files and directories in HDFS? Any suggestions are
 appreciated.

 ** **

 Thanks.

 ** **