Hi Zesheng,
I got from an offline email of you and knew your Hadoop version was 2.0.0-alpha
and you also said “The block is allocated successfully in NN, but isn’t created
in DN”.
Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is similar
with HDFS-4516. And can you try
Hi,
I have downloaded hadoop-2.5.0 and am trying to get it working for s3
backend *(single-node in a pseudo-distributed mode)*.
I have made changes to the core-site.xml according to
https://wiki.apache.org/hadoop/AmazonS3
I have an backend object store running on my machine that supports S3.
I
Thanks Yi, I will look into HDFS-4516.
2014-09-10 15:03 GMT+08:00 Liu, Yi A yi.a@intel.com:
Hi Zesheng,
I got from an offline email of you and knew your Hadoop version was
2.0.0-alpha and you also said “The block is allocated successfully in NN,
but isn’t created in DN”.
Yes, we
Hi Experts,
My hadoop cluster is enabled HA with QJM and I failed to upgrade it from
version 2.2.0 to 2.4.1. Why? Is this a existing issue?
My steps:
1. Stop hadoop cluster
2. On each node, upgrade hadoop binary with the newer version
3. On each JournalNode:
sbin/hadoop-daemon.sh start
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or
dfs.namenode.rpc-address is not configured.
Starting namenodes on []
NameNode/DataNode are part of a HDFS service. It makes no sense to try
and run them over an S3 URL default, which is a distributed filesystem
in
Thank you for your all support.
I could fix the issue this morning using this link, it was clearly explain.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids
You can use the link as well.
Warm regard
From:
Hello,
I developed a custom compression codec for Hadoop. Of course Hadoop is set to
use my codec when compressing data.
For testing purposes, I use the following two commands:
Compression test command:
---
hadoop jar
I want to unsubscribe from this mailing list
On Wed, Sep 10, 2014 at 4:42 PM, Charles Robertson
charles.robert...@gmail.com wrote:
Hi all,
Is it possible to use regular expressions in fs commands? Specifically, I
want to use the copy (-cp) and move (-mv) commands on all files in a
Yes you can :
hadoop fs -ls /tmp/myfiles*
I would recommend first using -ls in order to verify you are selecting
the right files.
#Mahesh : do you need some help doing this ?
On 10.09.2014 13:46, Mahesh Khandewal wrote:
I want to unsubscribe from this mailing list
On Wed, Sep 10, 2014 at
Hi Yi,
I went through HDFS-4516, and it really solves our problem, thanks very
much!
2014-09-10 16:39 GMT+08:00 Zesheng Wu wuzeshen...@gmail.com:
Thanks Yi, I will look into HDFS-4516.
2014-09-10 15:03 GMT+08:00 Liu, Yi A yi.a@intel.com:
Hi Zesheng,
I got from an offline email of
Hi Georgi,
Thanks for your reply. Won't hadoop fs -ls /tmp/myfiles* return all files
that begin with 'myfiles' in the tmp directory? What I don't understand is
how I can specify a pattern that excludes files ending in '.tmp'. I have
tried using the normal regular expression syntax for this
That’s great.
Regards,
Yi Liu
From: Zesheng Wu [mailto:wuzeshen...@gmail.com]
Sent: Wednesday, September 10, 2014 8:25 PM
To: user@hadoop.apache.org
Subject: Re: HDFS: Couldn't obtain the locations of the last block
Hi Yi,
I went through HDFS-4516, and it really solves our problem, thanks very
Hello Hadoopers,
Here is the error, I'm facing when running WordCount example program written by
myself.
Kind find attached the file of my WordCount program.
Below the error.
*hdfs://latdevweb02:9000/home/hadoop/hadoop/input*
is this is a valid path on hdfs? Can you access this path outside of the
program? For example using hadoop fs -ls command? Also, was this path and
files in it, created by a different user?
The exception seem to say that it does not exist or the
Hi have you set a class in your code ?
WARN mapred.JobClient: No job jar file set. User classes may not be found.
See JobConf(Class) or JobConf#setJar(String).
Also you need to check the path for your input file
Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input
Hello,
I am getting following error when running on 500MB dataset compressed in
avro data format.
Container [pid=22961,containerID=container_1409834588043_0080_01_10] is
running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB
physical memory used; 2.1 GB of 2.1 GB virtual
Hi,
Please that is my real problem.
Could you please look into my code in attached and tell me how I can update
this, please ?
How to set a job jar file?
And now, here is my hdfs-site.xml
==
-bash-4.1$ cat conf/hdfs-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl
Hi,
In fact,
hdfs://latdevweb02:9000/home/hadoop/hadoop/input
is not a folder on hdfs.
I created a folder /tmp/hadoop-hadoop/dfs/data, where data will be saved in
hdfs.
And in my HADOOP_HOME folder, there is two folders “input” and “output”, but I
don’t know how to configure them in the
Hi,
I am trying the smoke test for Hadoop (2.4.1). About “terasort”, below is my
test command, the Map part was completed very fast because it was split into
many subtasks, however the Reduce part takes very long time and only 1 running
Reduce job. Is there a way speed up the reduce phase by
HDFS doesn't support he full range of glob matching you will find in Linux.
If you want to exclude all files from a directory listing that meet a
certain criteria try doing your listing and using grep -v to exclude the
matching records.
Hello!
Imagine the following common task: I want to process big text file line-by-line
using streaming interface.
Run unix grep command for instance. Or some other line-by-line processing,
e.g. line.upper().
I copy file to HDFS.
Then I run a map task on this file which reads one line,
You can set the number of reducers used in any hadoop job from the command
line by using -Dmapred.reduce.tasks=XX.
e.g. hadoop jar hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=10 /terasort-input /terasort-output
If you don't want key in the final output, you can set like this in Java.
job.setOutputKeyClass(NullWritable.class);
It will just print the value in the output file.
I don't how to do it in python.
On 9/10/14, Dmitry Sivachenko trtrmi...@gmail.com wrote:
Hello!
Imagine the following common
In python, or any streaming program just set the output value to the empty
string and you will get something like key\t.
On Wed, Sep 10, 2014 at 12:03 PM, Susheel Kumar Gadalay skgada...@gmail.com
wrote:
If you don't want key in the final output, you can set like this in Java.
On 10 сент. 2014 г., at 22:05, Rich Haase rdha...@gmail.com wrote:
In python, or any streaming program just set the output value to the empty
string and you will get something like key\t.
I see, but I want to use many existing programs (like UNIX grep), and I don't
want to have and extra
You can write a custom output format, or you can write your mapreduce job
in Java and use a NullWritable as Susheel recommended.
grep (and every other *nix text processing command) I can think of would
not be limited by a trailing tab character. It's even quite easy to strip
away that tab
On 10 сент. 2014 г., at 22:19, Rich Haase rdha...@gmail.com wrote:
You can write a custom output format
Any clues how can this can be done?
, or you can write your mapreduce job in Java and use a NullWritable as
Susheel recommended.
grep (and every other *nix text processing
Examples (the top ones are related to streaming jobs):
http://www.infoq.com/articles/HadoopOutputFormat
http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
10 сент. 2014 г., в 22:47, Shahab Yunus shahab.yu...@gmail.com написал(а):
Examples (the top ones are related to streaming jobs):
http://www.infoq.com/articles/HadoopOutputFormat
http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
Use ‘tr -s’ to stripe out tabs?
$ echo -e a\t\t\tb
a b
$ echo -e a\t\t\tb | tr -s \t
a b
On Sep 10, 2014, at 11:28 AM, Dmitry Sivachenko trtrmi...@gmail.com wrote:
On 10 сент. 2014 г., at 22:19, Rich Haase rdha...@gmail.com wrote:
You can write a custom
If you don’t want anything get inserted, just set your output to key only or
value only.
TextOutputFormat$LineRecordWriter won’t insert anything unless both values are
set:
public synchronized void write(K key, V value)
throws IOException {
boolean nullKey = key == null || key
On 11 сент. 2014 г., at 0:47, Felix Chern idry...@gmail.com wrote:
If you don’t want anything get inserted, just set your output to key only or
value only.
TextOutputFormat$LineRecordWriter won’t insert anything unless both values
are set:
If I output value only, for instance, and my line
I solved this in the end by using a shell script (initiated by an oozie
shell action) to use grep and loop through the results - didn't have to use
-v option, as the -e option gives you access to a fuller range of regular
expression functionality.
Thanks for your help (again!) Rich.
Charles
On
Hi experts,
I faced one strange issue I cannot understand, can you guys tell me if this
is a bug or I configured something wrong. Below is my situation.
I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each
node only can run a single job at a point of time, I used
hadoop 2.4.1
Balancing is very slow.
$HADOOP_PREFIX/bin/hdfs dfsadmin -setBalancerBandwidth 52428800
It takes long time to move the one block.
2014. 09. 11. 11:38:01 Block begins to move
2014-09-11 11:47:20 Complete block move
#10.2.1.211 netstat, Block begins to move, 10.2.1.210
35 matches
Mail list logo