Hi Ajay,
Try SequenceFileAsBinaryInputFormat ?
Thanks
Rekha
On 11/09/12 11:24 AM, "Ajay Srivastava" wrote:
>Hi,
>
>I am using default inputFormat class for reading input from text files
>but the input file has some non utf-8 characters.
>I guess that TextInputFormat class is default inputForm
Hi,
What happens when an existing (not new) datanode rejoins a cluster for
following scenarios:
1. Some of the blocks it was managing are deleted/modified?
2. The size of the blocks are now modified say from 64MB to 128MB?
3. What if the block replication factor was one (yea
Hi Mehul
Some of the blocks it was managing are deleted/modified?
The namenode will asynchronously replicate the blocks to other datanodes
in order to maintain the replication factor after a datanode has not
been in contact for 10 minutes.
The size of the blocks are now modified say from
Mehul,
Let me make an addition.
Some of the blocks it was managing are deleted/modified?
Blocks that are deleted in the interim will deleted on the rejoining
node as well, after it rejoins . Regarding the "modified," I'd advise
against modifying blocks after they have been fully written.
Actually even if that works, it does not seem an ideal solution.
I think format and encoding are distinct, and enforcing format must not
enforce an encoding.So that means there must be a possibility to pass
encoding as a user choice on construction,
e.g.:TextInputFormat("your-encoding").
But I do
George has answered most of these. I'll just add on:
On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube
wrote:
> 1. Some of the blocks it was managing are deleted/modified?
A DN runs a block report upon start, and sends the list of blocks to
the NN. NN validates them and if it finds any files
Hi Jason,
Mehmet said is exactly correct ,without reducers we cannot
increase performance please you can add mappers and reducers in any
processing data you can get output and performance is good.
Thanks & Regards,
Ramesh.Narasingu
On Tue, Sep 11, 2012 at 9:31 AM, Mehmet Tepedelenliogl
Rekha,
I guess that problem is that Text class uses utf-8 encoding and one can not set
other encoding for this class.
I have not seen any other Text like class which supports other encoding
otherwise I have written my custom input format class.
Thanks for your inputs.
Regards,
Ajay Srivastava
Hi Harsh,
Thanks for your reply. And I am sorry for my unclear description.
As I mentioned previous, I think I configured the fairsheduler correctly in
hadoop-0.22.0.
But when I commit lots of the jobs:
many big jobs (map number and reduce number is bigger than the
map/reduce slot) commit fir
Hi,
What happens when an existing (not new) datanode rejoins a cluster for
following scenarios:
a) Some of the blocks it was managing are deleted/modified?
b) The size of the blocks are now modified say from 64MB to 128MB?
c) What if the block replication factor was one (yea not in most
deplo
Changing the hostname to lowercase fixed this particular problem - thanks for
your replies.
The build is failing elsewhere now, I'll post a new thread for that.
Tony
From: Tony Burton [mailto:tbur...@sportingindex.com]
Sent: 10 September 2012 10:44
To: user@hadoop.apache.org
Subject: RE: bui
Yes the cluster will be re balanced.
On Tue, Sep 11, 2012 at 2:09 PM, mehul choube wrote:
> Hi,
>
>
> What happens when an existing (not new) datanode rejoins a cluster for
> following scenarios:
>
>
> a) Some of the blocks it was managing are deleted/modified?
>
> b) The size of the blocks are
Hi Mehul,
Please do not send multiple mails with the same questions. We've
already answered this at your other post, follow thread at:
http://mail-archives.apache.org/mod_mbox/hadoop-user/201209.mbox/%3ce884ec9cd547324b8976a5d37317ac566d11c7f...@apj1xchevspin30.symc.symantec.com%3e
On Tue, Sep 11
Hi,
I've checked out the hadoop trunk, and I'm running "mvn test" on the codebase
as part of following the "How To Contribute" guide at
http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing
in hadoop-mapreduce-client-jobclient, the failure message is below - something
Hi Vinod,
Please check whether input file location and output file
location doesnt match. please find your input file first put into HDFS and
then run MR job it is working fine.
Thanks & Regards,
Ramesh.Narasingu
On Tue, Sep 11, 2012 at 4:23 AM, Vinod Kumar Vavilapalli <
vino...@horton
Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on from
Sungard, so Outlook's filtering rules will look after his list bounce messages
from now on :)
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: 10 September 2012 12:11
To: user@hadoop.apache
Hi,
Please find i think one command is there then only build the all
applications.
Thanks & Regards,
Ramesh.Narasingu
On Tue, Sep 11, 2012 at 2:28 PM, Tony Burton wrote:
> Hi,
>
> ** **
>
> I’ve checked out the hadoop trunk, and I’m running “mvn test” on the
> codebase as part of fo
> The namenode will asynchronously replicate the blocks to other datanodes in
> order to maintain the replication factor after a datanode has not been in
> contact for 10 minutes.
What happens when the datanode rejoins after namenode has already re-replicated
the blocs it was managing?
Will name
I apologize for this :( I thought the earlier mail didn't go through
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Tuesday, September 11, 2012 2:31 PM
To: user@hadoop.apache.org
Subject: Re: what happens when a datanode rejoins?
Hi Mehul,
Please do not send multiple
Hi Mehul,
DataNode rejoins take care of only NameNode.
Thanks & Regards,
Ramesh.Narasingu
On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube wrote:
> > The namenode will asynchronously replicate the blocks to other
> datanodes in order to maintain the replication factor after a datanode h
Hi,
Inline.
On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube wrote:
>> The namenode will asynchronously replicate the blocks to other datanodes
>> in order to maintain the replication factor after a datanode has not been in
>> contact for 10 minutes.
>
> What happens when the datanode rejoins after
>DataNode rejoins take care of only NameNode.
Sorry didn't get this
From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com]
Sent: Tuesday, September 11, 2012 2:38 PM
To: user@hadoop.apache.org
Subject: Re: what happens when a datanode rejoins?
Hi Mehul,
DataNode rejoins take care
Ha, good sleuthing.
I just moved it to INFRA, as no one from our side has gotten to this
yet. I guess we can only moderate, not administrate. So the ticket now
awaits action from INFRA on ejecting it out.
On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton wrote:
> Thanks Harsh. By the looks of Stanisl
Thanks Harsh and Daryn !
Daryn, I tried the option you suggested and changed getFileSystem(String user)
implementation as following...
private static FileSystem getFileSystem(String user) throws Exception {
final Configuration conf = new Configuration();
conf.set("fs.default.
Hi Ramesh
Thanks for the quick reply, but I'm having trouble following your English. Are
you saying that there is one command to build everything? If so, can you tell
me what it is?
Tony
From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com]
Sent: 11 September 2012 10:06
To: user@hadoop
And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony!
On Tue, Sep 11, 2012 at 2:44 PM, Harsh J wrote:
> Ha, good sleuthing.
>
> I just moved it to INFRA, as no one from our side has gotten to this
> yet. I guess we can only moderate, not administrate. So the ticket now
>
It's probably some maven thing -in particular Maven's habit of grabbing the
online nightly snapshots off apache rather than local,
try mvn clean install -DskipTests -offline
to force in all the artifacts, then run the MR tests
Tony -why not get on the mapreduce-dev mailing list, as this is the p
Thanks Steve, I’ll try the mvn command you suggest. All the snapshots I can see
came from repository.apache.org though.
How do I run the MR tests only?
Thanks for the mapreduce-dev mailing list suggestion, I thought all lists had
merged into one though – did I get the wrong end of the stick?
T
No problem! I'll remove that Outlook filter now... :)
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: 11 September 2012 10:34
To: user@hadoop.apache.org
Subject: Re: Undeliverable messages
And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony!
I tried this:
cd into hadoop-mapreduce-project
ant test
and got further build errors - I’ll try mapreduce-dev.
From: Tony Burton [mailto:tbur...@sportingindex.com]
Sent: 11 September 2012 10:55
To: user@hadoop.apache.org
Subject: RE: hadoop trunk build failure - yarn, surefire related?
Thanks
Tony,
What I do is:
$ cd hadoop/; mvn install -DskipTests; cd hadoop-mapreduce-project/; mvn test
This seems to work in running without any missing dependencies at least.
The user lists are all merged, but the developer lists remain separate
as that works better.
On Tue, Sep 11, 2012 at 3:24 P
Ok - thanks Harsh.
Following Steve's earlier advice I tried the mvn install, which worked fine,
then ant test in hadoop-mapreduce-project which failed. I was mid-email to
mapreduce-dev@hadoop, now I'll try mvn test in hadoop-mapreduce-project and
report back.
Tony
-Original Message-
Ah there is no longer a need to run ant anymore on trunk. Ignore the
leftover files in the base MR project directory - those should be
cleaned up soon. All of the functional pieces of the build right now
definitely use Maven.
On Tue, Sep 11, 2012 at 4:01 PM, Tony Burton wrote:
> Ok - thanks Harsh
That's good to know - is there a more up to date guide than
http://wiki.apache.org/hadoop/HowToContribute which still makes many references
to ant builds?
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: 11 September 2012 11:36
To: user@hadoop.apache.org
Subject: Re
I guess we'll need to clean that guide and divide it in two - For
branch-1 maintenance contributors, and for trunk contributors.
I had another page that serves a slightly different purpose, but may
help you just the same:
http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk
On Tue, Sep
Also, for the maven based builds, BUILDING.txt in the root folder of hadoop
source does get one started.
Thanks
hemanth
On Tue, Sep 11, 2012 at 4:28 PM, Harsh J wrote:
> I guess we'll need to clean that guide and divide it in two - For
> branch-1 maintenance contributors, and for trunk contribu
Good suggestions Harsh and Hemanth.
When I was asked to submit a patch for hadoop 1.0.3, I thought it a good
exercise to work through the build process to become familiar even though the
patch is documentation-only. Maybe the requests for patches could come with a
list of suggested reading as w
Hi, all
I was wondering what's the default number of reducer if I don't set it in
configuration?
Will it change dynamically according to the output volume of Mapper?
--
YANG, Lin
Hi Lin
The default value for number of reducers is 1
mapred.reduce.tasks
1
It is not determined by data volume. You need to specify the number of
reducers for your mapreduce jobs as per your data volume.
Regards
Bejoy KS
On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang wrote:
> Hi, all
>
> I was
Hi, Bejoy
Thanks for you reply.
where could I find the default value of mapred.reduce.tasks ? I have
checked the core-site.xml, hdfs-site.xml and mapred-site.xml, but I haven't
found it.
2012/9/11 Bejoy Ks
> Hi Lin
>
> The default value for number of reducers is 1
>
> mapred.reduce.tasks
> 1
Dear Madam,
I'am keeping as attachment relevant screen shots of running hive
Profile.png shows contents of /etc/profile
hive_common_lib.png shows h-ve_common*.jar is already in $HIVE_HOME/lib , here
$HIVE_HOME is /home/yahoo/hive/build/dist as evident from classpath_err.png
Yours Truly
G Sud
Hi Lin
The default values for all the properties are in
core-default.xml
hdfs-default.xml and
mapred-default.xml
Regards
Bejoy KS
On Tue, Sep 11, 2012 at 5:06 PM, Jason Yang wrote:
> Hi, Bejoy
>
> Thanks for you reply.
>
> where could I find the default value of mapred.reduce.tasks ? I have
>
Just to add the name is depreciated in new Hadoop
Try to find
mapreduce.job.reduces
On Tue, Sep 11, 2012 at 9:43 PM, Bejoy Ks wrote:
> Hi Lin
>
> The default values for all the properties are in
> core-default.xml
> hdfs-default.xml and
> mapred-default.xml
>
>
> Regards
> Bejoy KS
>
>
> On Tu
All right, I got it.
Thank you very much~
2012/9/11 Jagat Singh
> Just to add the name is depreciated in new Hadoop
> Try to find
> mapreduce.job.reduces
>
>
>
>
> On Tue, Sep 11, 2012 at 9:43 PM, Bejoy Ks wrote:
>
>> Hi Lin
>>
>> The default values for all the properties are in
>> core-defaul
Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:
- (more general question) Are there many use-cases for using
DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
- What happens when the database is updated dur
Hi Yaron
Sqoop uses a similar implementation. You can get some details there.
Replies inline
• (more general question) Are there many use-cases for using DBInputFormat? Do
most Hadoop jobs take their input from files or DBs?
> From my small experience Most MR jobs have data in hdfs. It is usefu
Hi Yaron
Replies inline below.
On 09/11/2012 07:41 AM, Yaron Gonen wrote:
Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:
* (more general question) Are there many use-cases for using
DBInputFormat? Do most Hadoop jobs take t
Hi,
I want to make sure my understanding about task assignment in hadoop
is correct or not.
When scanning a file with multiple tasktrackers,
I am wondering how a task is assigned to each tasktracker .
Is it based on the block sequence or data locality ?
Let me explain my question by example.
The
Thanks for the fast response.
Nick, regarding locking a table: as far as I understood from the code, each
mapper opens its own connection to the DB. I didn't see any code such that
the job creates a transaction and passes it to the mapper. Did I
miss something?
again, thanks!
On Tue, Sep 11, 2012
Hi,
Task assignment takes data locality into account first and not block
sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks.
When such a request comes to the jobtracker, it will try to look for an
unassigned task which needs data that is close to the tasktracker and will
ass
Hi,
I have a configuration JSON file which is accessed by MR job for every input.So
, I created a class with a static block, load the JSON file in static Instance
variable.
So everytime my mapper or reducer wants to access configuration can use this
Instance variable. But on a single node cl
Hi Stuti
You can pass the json object as a configuration property from your main
class then Initialize this static json object on the configure() method.
Every instance of map or reduce task will have this configure()
method executed once before the map()/reduce() function . So all the
executions
I need to implement secondary sort within an avro based MR sequence. I
however find little to documentation or examples online.
I would like to implement this by overriding the 'int
compare(AvroWrapper x, AvroWrapper y)' method but I fail to have it
invoked.
Does anybody have experience implementi
Another "mvn test" caused the build to fail slightly further down the road. As
my Jira issue is documentation-only, I've submitted the patch anyway.
Is this multiple-failure scenario typical for trying to build hadoop from the
trunk? It's sure putting me off submitting code in future. Is there a
Hi all,
I am running hadoop-0.20.2 on single node cluster,
I run the command
hadoop fsck /
it shows error:
Exception in thread "main" java.net.UnknownHostException: http
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect
Could you please review your configuration to see if you are pointing to
the right namenode address ? (This will be in core-site.xml)
Please paste it here so we can look for clues.
Thanks
hemanth
On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari wrote:
> Hi all,
>
> I am running hadoop-0.20.2 on s
Hi Hemant,
Its the content of core-site.xml
fs.default.name
hdfs://localhost:9000
hadoop.tmp.dir
/opt/hadoop-0.20.2/hadoop_temporary_dirr
A base for other temporary directories.
Regards
Yogesh Kumar
Date: Tue, 11 Sep 2012 21:29:36 +0530
Subject: Re:
Hi, thank you for the comment.
> Task assignment takes data locality into account first and not block sequence.
Does it work like that when replica factor is set to 1 ?
I just had a experiment to check the behavior.
There are 14 nodes (node01 to node14) and there are 14 datanodes and
14 tasktrac
Yogesh
try this
hadoop fsck -Ddfs.http.address=localhost:50070 /
50070 is the default http port that the namenode runs on. The property
dfs.http.address should be set in your hdfs-site.xml
--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/
On Sep 11, 2012, at 9:03 AM, yogesh dhari wrote
Atop what Arpit has said, the format of dfs.http.address is a simple
host:port, and should not be http://host:port (which you may have set
instead).
On Tue, Sep 11, 2012 at 10:14 PM, Arpit Gupta wrote:
> Yogesh
>
> try this
>
> hadoop fsck -Ddfs.http.address=localhost:50070 /
>
> 50070 is the def
Hi, All
I need to setup a hadoop/hdfs cluster with one namenode on a machine and
two datanodes on two other machines. But after setting datanode machiines
in conf/slaves file, running bin/start-dfs.sh can not start hdfs normally..
I am aware that I have not specify the root directory hadoop is inst
How security is maintained in hadoop, is it maintained by giving
folder/file permissions in hadoop
how can i make sure that somebody else dunt write in to my hdfs file system
...
Hello all,
I am not getting the clear way out to remove datanode from the cluster.
please explain me decommissioning steps with example.
like how to creating exclude files and other steps involved in it.
Thanks & regards
Yogesh Kumar
By reading the documentation, like the following
http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html
On Tue, Sep 11, 2012 at 8:14 PM, nisha wrote:
> How security is maintained in hadoop, is it maintained by giving
> folder/file permissions in hadoop
> how can i make sure that somebo
Hello all,
I am not getting the clear way out to remove datanode from the cluster.
please explain me decommissioning steps with example.
like how to creating exclude files and other steps involved in it.
Thanks & regards
Yogesh Kumar
hi yogesh,
Hope this helps
To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the
exclude file. Do not update the include file at this point.
2. Update the namenode with the new set of permitted datanodes, with
this command:
% hadoop dfsadmin -r
Hi Yogesh
The detailed steps are available in hadoop wiki on FAQ page
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
Regrads
Bejoy KS
On Wed, Sep 12, 2012 at 12:14 AM, yogesh dhari wrote:
> Hel
any hints experts atleast if i m on the right track or we cant use hftp at
all coz the browser wont understand it?
On Mon, Sep 10, 2012 at 1:58 PM, Visioner Sadak wrote:
> or shud i use datanode ip for accessing images using hftp
>
> ftp://localhost:50075/Comb/java1.jpg"/>
>
> my datanode is
your browser does not know what an hftp file system is so it wont work. If you
use WebHDFS it has rest api's that you can use to read data from hdfs. I would
suggest look at those and try them out.
http://hadoop.apache.org/docs/stable/webhdfs.html
--
Arpit Gupta
Hortonworks Inc.
http://hortonwo
Hi Yaron,
I haven't looked at/used it in awhile but I seem to remember that each
mapper's SQL request was wrapped in a transaction to prevent the number
of rows changing. DBInputFormat uses
Connection.TRANSACTION_SERIALIZABLE from java.sql.Connection to prevent
changes in the number of rows
FAIL
Original Message
Subject: Unsubscribe
From: Kunaal
Date: Tue, September 11, 2012 8:01 pm
To: user@hadoop.apache.org
You should never expose internal host names in the Javascript/HTML.
The flow can be
Browser --> Tomcat --(REST, HDFS Client)--> HDFS
Your web app can make REST requests to HDFS and you could use JAX-RS impl
for REST talk in your web app.
I must warn that user experience will suffer by any of th
Hello,
I have simple map/reduce program to merge input files into one big output
files. My question is, is there a way not to output the key from the reducer to
the output file? I only want the value, not the key for each record.
Thanks
*
I figured out the cause.
HDFS block size is 128MB, but
I specify mapred.min.split.size as 512MB,
and data local I/O processing goes wrong for some reason.
When I remove the mapred.min.split.size configuration,
tasktrackers pick data-local tasks.
Why does it happen ?
It seems like a bug.
Split is a
Here's one...
Write a Java program which can be accessed on the server side to pull the
picture from HDFS and display it on your JSP.
On Sep 11, 2012, at 3:48 PM, Visioner Sadak wrote:
> any hints experts atleast if i m on the right track or we cant use hftp at
> all coz the browser wont u
Hi,
I have a sequence file written by SequenceFileOutputFormat with key/value
type of , like below:
Text BytesWritable
-
id_A_01 7F2B3C687F2B3C687F2B3C68
id_A_02 2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7
id_A
Hi,
You have to specify the reducer key out type as NullWritable.
Cheers!
Manoj.
On Wed, Sep 12, 2012 at 7:43 AM, Nataraj Rashmi - rnatar <
rashmi.nata...@acxiom.com> wrote:
> Hello,
>
> ** **
>
> I have simple map/reduce program to merge input files into one big output
> files. My quest
Hey Jason,
Is the file pre-sorted? You could override the OutputFormat's
#getSplits method to return InputSplits at identified key boundaries,
as one solution - this would require reading the file up-front (at
submit-time) and building the input splits out of it.
On Wed, Sep 12, 2012 at 8:45 AM,
Hi Richard,
If you have installed the hadoop software on the same locations on all
machines and if you have a common user on all the machines, then there
should be no explicit need to specify anything more on the slaves.
Can you tell us whether the above two conditions are true ? If yes, some
mor
Thanks Bejoy,
I try to implement and if face any issues will let you know.
Thanks
Stuti
From: Bejoy Ks [mailto:bejoy.had...@gmail.com]
Sent: Tuesday, September 11, 2012 8:39 PM
To: user@hadoop.apache.org
Subject: Re: Issue in access static object in MapReduce
Hi Stuti
You can pass the json obje
Hi Yogesh..
FYI. Please go through following..
http://tech.zhenhua.info/2011/04/how-to-decommission-nodesblacklist.html
http://hadoop-karma.blogspot.in/2011/01/hadoop-cookbook-how-to-decommission.html
From: yogesh dhari [yogeshdh...@live.com]
Sent: Wednesday,
Have you looked at Terracotta or any other distributed caching system?
Kunal
-- Sent while mobile --
On Sep 11, 2012, at 9:30 PM, Stuti Awasthi wrote:
> Thanks Bejoy,
> I try to implement and if face any issues will let you know.
>
> Thanks
> Stuti
>
> From: Bejoy Ks [mailto:bejoy.had...@gm
If the file is pre-sorted, why not just make multiple sequence files -
1 for each split?
Then you don't have to compute InputSplits because the physical files
are already split.
On Tue, Sep 11, 2012 at 11:00 PM, Harsh J wrote:
> Hey Jason,
>
> Is the file pre-sorted? You could override the Outpu
Hi Jason,
I am wondering about use case of distributing records on the basis of key to
mapper. If possible, could you please share your scenario ?
Is it map only job ? Why not distribute records using partitioner and do the
processing in reducers ?
Regards,
Ajay Srivastava
On 12-Sep-2012, at
Hi,
I tried a similar experiment as yours but couldn't replicate the issue.
I generated 64 MB files and added them to my DFS - one file from every
machine, with a replication factor of 1, like you did. My block size was
64MB. I verified the blocks were located on the same machine as where I
adde
hey guys,
Thanks for all your suggestions.
To wrap up, there're two ways to achieve this:
1. use multiple sequence files, then write a WholeFileInputFormat which use
each file as a split by overriding the isSeparatable();
2. Distribute records using partitioner and do the processing in reducers,
Thanks a ton guys for showing the right direction i was so wrong with
hftp, will try out web hdfs,is hdfs FUSE mount a good approach by using
that i will have to just mount my existing local java uploads in to hdfs
but can i access Har files using this or will i have to create a symlink
for ac
88 matches
Mail list logo