Java/Webhdfs - upload with snappy compressing on the fly

2014-07-14 Thread Michał Michalak
Hello

I need to upload large file using WEBHDFS (from local disk into HDFS,
WEBHDFS is my only option, don't have direct access). Because in my case
network connection is bottleneck I decided to compress file with snappy
before sending. I am using Java application, compiled with
"org.apache.hadoop:hadoop-client:2.4.0" library.

So far my code looks as below:

private void uploadFile(Path hdfsPath, FileSystem fileSystem) throws
IOException {
// Input file reader
BufferedReader bufferedReader = new BufferedReader(new
FileReader(localFile), INPUT_STREAM_BUFFER_SIZE);

// Output file writer
FSDataOutputStream hdfsDataOutputStream =
fileSystem.create(hdfsPath, false, OUTPUT_STREAM_BUFFER_SIZE);
SnappyOutputStream snappyOutputStream = new
SnappyOutputStream(hdfsDataOutputStream, OUTPUT_STREAM_BUFFER_SIZE);
BufferedWriter bufferedWriter = new BufferedWriter(new
OutputStreamWriter(snappyOutputStream, "UTF-8"));

String line;
while ((line = bufferedReader.readLine()) != null) {
bufferedWriter.write(line);
}

bufferedReader.close();
bufferedWriter.close();
}

Basically it works. Snappy compressed file is uploaded to HDFS, yet there
seem to be problems with snappy format itsefl. It is not recognized as
snappy compressed file by Hadoop. I checked my compressed file, and another
one compressed by Hadoop. Main compressed stream seem to be the same in
both files, but headers are different.

What do I do wrong? Would you be so kind to suggest any solution for my
issue?

Best Regards
Michal Michalak


default 8 mappers per host ?

2014-07-14 Thread Sisu Xi
Hi, all:

I configured a hadoop cluster with 9 hosts, each with 2 VCPU and 4G Ram.

I noticed when I run the example pi program, only when I configure it with
at least 8*9=72 mappers will all hosts be busy.
Which means there is a default 8 mappers per host?

How is this value decided? And where can I change it?

Thanks very much!

Sisu

-- 


*Sisu Xi, PhD Candidate*

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130


Re: Not able to place enough replicas

2014-07-14 Thread Yanbo Liang
Maybe the user 'test' has no privilege of write operation.
You can refer the ERROR log like:
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:test (auth:SIMPLE)


2014-07-15 2:07 GMT+08:00 Bogdan Raducanu :

> I'm getting this error while writing many files.
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not
> able to place enough replicas, still in need of 4 to reach 4
>
> I've set logging to DEBUG but still there is no reason printed. There
> should've been a reason after this line but instead there's just an empty
> line.
> Has anyone seen something like this before? It is seen on a 4 node cluster
> running hadoop 2.2
>
>
> org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.create: file /file_1002
> for DFSClient_NONMAPREDUCE_839626346_1 at 192.168.180.1
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile:
> src=/file_1002, holder=DFSClient_NONMAPREDUCE_839626346_1,
> clientMachine=192.168.180.1, createParent=true, replication=4,
> createFlag=[CREATE, OVERWRITE]
> org.apache.hadoop.hdfs.StateChange: DIR* addFile: /file_1002 is added
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: add
> /file_1002 to namespace for DFSClient_NONMAPREDUCE_839
> << ... many other operations ... >>
> 8 seconds later:
> org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file
> /file_1002 fileId=189252 for DFSClient_NONMAPREDUCE_839626346_1
> org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock:
> file /file_1002 for DFSClient_NONMAPREDUCE_839626346_1
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not
> able to place enough replicas, still in need of 4 to reach 4
> << EMPTY LINE >>
> org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:test (auth:SIMPLE) cause:java.io.IOException:
> File /file_1002 could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 4 datanode(s) running and no node(s) are
> excluded in this operation.
> org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
> 192.168.180.1:49592 Call#1321 Retry#0: error: java.io.IOException: File
> /file_1002 could only be replicated to 0 nodes instead of minReplication
> (=1).  There are 4 datanode(s) running and no node(s) are excluded in this
> operation.
> java.io.IOException: File /file_1002 could only be replicated to 0 nodes
> instead of minReplication (=1).  There are 4 datanode(s) running and no
> node(s) are excluded in this operation.
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042):0
>
>


Re: clarification on HBASE functionality

2014-07-14 Thread Ted Yu
Right.
hbase is different from Cassandra in this regard.


On Mon, Jul 14, 2014 at 2:57 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefi...@hotmail.com> wrote:

>   Now this is different from Cassandra which does NOT use HDFS correct?
> (Sorry. Don’t know why that needed two emails.)
>
> B.
>
>  *From:* Ted Yu 
> *Sent:* Monday, July 14, 2014 4:53 PM
> *To:* mailto:user@hadoop.apache.org 
> *Subject:* Re: clarification on HBASE functionality
>
>  Yes.
> See http://hbase.apache.org/book.html#arch.hdfs
>
>
> On Mon, Jul 14, 2014 at 2:52 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefi...@hotmail.com> wrote:
>
>> HBASE uses HDFS to store it's data correct?
>>
>> B.
>>
>
>


Re: clarification on HBASE functionality

2014-07-14 Thread Adaryl "Bob" Wakefield, MBA
Now this is different from Cassandra which does NOT use HDFS correct? (Sorry. 
Don’t know why that needed two emails.)

B.

From: Ted Yu 
Sent: Monday, July 14, 2014 4:53 PM
To: mailto:user@hadoop.apache.org 
Subject: Re: clarification on HBASE functionality

Yes. 
See http://hbase.apache.org/book.html#arch.hdfs



On Mon, Jul 14, 2014 at 2:52 PM, Adaryl "Bob" Wakefield, MBA 
 wrote:

  HBASE uses HDFS to store it's data correct?

  B.



Re: clarification on HBASE functionality

2014-07-14 Thread Ted Yu
Yes.
See http://hbase.apache.org/book.html#arch.hdfs


On Mon, Jul 14, 2014 at 2:52 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefi...@hotmail.com> wrote:

> HBASE uses HDFS to store it's data correct?
>
> B.
>


clarification on HBASE functionality

2014-07-14 Thread Adaryl "Bob" Wakefield, MBA

HBASE uses HDFS to store it's data correct?

B.


Re: OIV Compatiblity

2014-07-14 Thread Harsh J
There shouldn't be any - it basically streams over the existing local
fsimage file.

On Tue, Jul 15, 2014 at 12:21 AM, Ashish Dobhal
 wrote:
> Sir I tried it it works. Are there any issues in downloading the gsimage
> using wget.
>
>
> On Tue, Jul 15, 2014 at 12:17 AM, Harsh J  wrote:
>>
>> Sure, you could try that. I've not tested that mix though, and OIV
>> relies on some known formats support, but should hopefully work.
>>
>> On Mon, Jul 14, 2014 at 11:56 PM, Ashish Dobhal
>>  wrote:
>> > Could I download the fsimage of a hadoop 1.0 using wget and then
>> > interpret
>> > it in offline mode using the tool in the hadoop 1.2 or higher
>> > distributions.I guess the structure of fsimage would be same for both
>> > the
>> > distributions.
>> >
>> >
>> > On Mon, Jul 14, 2014 at 11:53 PM, Ashish Dobhal
>> > 
>> > wrote:
>> >>
>> >> Harsh thanks
>> >>
>> >>
>> >> On Mon, Jul 14, 2014 at 11:39 PM, Harsh J  wrote:
>> >>>
>> >>> The OIV for 1.x series is available in release 1.2.0 and higher. You
>> >>> can use it from the 'hadoop oiv' command.
>> >>>
>> >>> It is not available in 1.0.x.
>> >>>
>> >>> On Mon, Jul 14, 2014 at 9:49 PM, Ashish Dobhal
>> >>>  wrote:
>> >>> > Hey everyone ;
>> >>> > Could anyone tell me how to use the OIV tool in hadoop 1.0 as there
>> >>> > is
>> >>> > no
>> >>> > hdfs.sh file there.
>> >>> > Thanks.
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J


Re: changing split size in Hadoop configuration

2014-07-14 Thread Bertrand Dechoux
For what it's worth, mapreduce.jobtracker.split.metainfo.maxsize is related
to the size of the file containing the information describing the input
splits. It is not related directly to the volume of data but to the number
of splits which might explode when using too many (small) files. It's
basically a safeguard. Alternatively, you might want to reduce the number
of splits ; raising the block size is one way to do it.

Bertrand Dechoux


On Mon, Jul 14, 2014 at 7:50 PM, Adam Kawa  wrote:

> It sounds like JobTracker setting, so the restart looks to be required.
>
> You verify it in pseudo-distributed mode by setting it to a very low
> value, restarting JT and seeing if you get the exception that prints this
> new value.
>
> Sent from my iPhone
>
> On 14 jul 2014, at 16:03, Jan Warchoł  wrote:
>
> Hello,
>
> I recently got "Split metadata size exceeded 1000" error when running
> Cascading jobs with very big joins.  I found that I should change
> mapreduce.jobtracker.split.metainfo.maxsize property in hadoop
> configuration by adding this to the mapred-site.xml file:
>
>   
> 
> mapreduce.jobtracker.split.metainfo.maxsize
> 10
>   
>
> but it didn't seem to have any effect - I'm probably doing something wrong.
>
> Where should I add this change so that is has the desired effect?  Do I
> understand correctly that jobtracker restart is required after making the
> change? The cluster I'm working on has Hadoop 1.0.4.
>
> thanks for any help,
> --
> *Jan Warchoł*
> *Software Engineer*
> 
>
> -
> M: +48 509 078 203
>  E: jan.warc...@codilime.com
> -
> CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland,
> 01-167 Warsaw, ul. Zawiszy 14/97. Registered by The District Court for the
> Capital City of Warsaw, XII Commercial Department of the National Court
> Register. Entered into National Court Register under No. KRS 388871.
> Tax identification number (NIP) 5272657478. Statistical number
> (REGON) 142974628.
>
>


Re: OIV Compatiblity

2014-07-14 Thread Ashish Dobhal
Sir I tried it it works. Are there any issues in downloading the gsimage
using wget.


On Tue, Jul 15, 2014 at 12:17 AM, Harsh J  wrote:

> Sure, you could try that. I've not tested that mix though, and OIV
> relies on some known formats support, but should hopefully work.
>
> On Mon, Jul 14, 2014 at 11:56 PM, Ashish Dobhal
>  wrote:
> > Could I download the fsimage of a hadoop 1.0 using wget and then
> interpret
> > it in offline mode using the tool in the hadoop 1.2 or higher
> > distributions.I guess the structure of fsimage would be same for both the
> > distributions.
> >
> >
> > On Mon, Jul 14, 2014 at 11:53 PM, Ashish Dobhal <
> dobhalashish...@gmail.com>
> > wrote:
> >>
> >> Harsh thanks
> >>
> >>
> >> On Mon, Jul 14, 2014 at 11:39 PM, Harsh J  wrote:
> >>>
> >>> The OIV for 1.x series is available in release 1.2.0 and higher. You
> >>> can use it from the 'hadoop oiv' command.
> >>>
> >>> It is not available in 1.0.x.
> >>>
> >>> On Mon, Jul 14, 2014 at 9:49 PM, Ashish Dobhal
> >>>  wrote:
> >>> > Hey everyone ;
> >>> > Could anyone tell me how to use the OIV tool in hadoop 1.0 as there
> is
> >>> > no
> >>> > hdfs.sh file there.
> >>> > Thanks.
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>


Re: OIV Compatiblity

2014-07-14 Thread Harsh J
Sure, you could try that. I've not tested that mix though, and OIV
relies on some known formats support, but should hopefully work.

On Mon, Jul 14, 2014 at 11:56 PM, Ashish Dobhal
 wrote:
> Could I download the fsimage of a hadoop 1.0 using wget and then interpret
> it in offline mode using the tool in the hadoop 1.2 or higher
> distributions.I guess the structure of fsimage would be same for both the
> distributions.
>
>
> On Mon, Jul 14, 2014 at 11:53 PM, Ashish Dobhal 
> wrote:
>>
>> Harsh thanks
>>
>>
>> On Mon, Jul 14, 2014 at 11:39 PM, Harsh J  wrote:
>>>
>>> The OIV for 1.x series is available in release 1.2.0 and higher. You
>>> can use it from the 'hadoop oiv' command.
>>>
>>> It is not available in 1.0.x.
>>>
>>> On Mon, Jul 14, 2014 at 9:49 PM, Ashish Dobhal
>>>  wrote:
>>> > Hey everyone ;
>>> > Could anyone tell me how to use the OIV tool in hadoop 1.0 as there is
>>> > no
>>> > hdfs.sh file there.
>>> > Thanks.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J


Re: OIV Compatiblity

2014-07-14 Thread Ashish Dobhal
Could I download the fsimage of a hadoop 1.0 using wget and then interpret
it in offline mode using the tool in the hadoop 1.2 or higher
distributions.I guess the structure of fsimage would be same for both the
distributions.


On Mon, Jul 14, 2014 at 11:53 PM, Ashish Dobhal 
wrote:

> Harsh thanks
>
>
> On Mon, Jul 14, 2014 at 11:39 PM, Harsh J  wrote:
>
>> The OIV for 1.x series is available in release 1.2.0 and higher. You
>> can use it from the 'hadoop oiv' command.
>>
>> It is not available in 1.0.x.
>>
>> On Mon, Jul 14, 2014 at 9:49 PM, Ashish Dobhal
>>  wrote:
>> > Hey everyone ;
>> > Could anyone tell me how to use the OIV tool in hadoop 1.0 as there is
>> no
>> > hdfs.sh file there.
>> > Thanks.
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: OIV Compatiblity

2014-07-14 Thread Ashish Dobhal
Harsh thanks


On Mon, Jul 14, 2014 at 11:39 PM, Harsh J  wrote:

> The OIV for 1.x series is available in release 1.2.0 and higher. You
> can use it from the 'hadoop oiv' command.
>
> It is not available in 1.0.x.
>
> On Mon, Jul 14, 2014 at 9:49 PM, Ashish Dobhal
>  wrote:
> > Hey everyone ;
> > Could anyone tell me how to use the OIV tool in hadoop 1.0 as there is no
> > hdfs.sh file there.
> > Thanks.
>
>
>
> --
> Harsh J
>


Re: OIV Compatiblity

2014-07-14 Thread Harsh J
The OIV for 1.x series is available in release 1.2.0 and higher. You
can use it from the 'hadoop oiv' command.

It is not available in 1.0.x.

On Mon, Jul 14, 2014 at 9:49 PM, Ashish Dobhal
 wrote:
> Hey everyone ;
> Could anyone tell me how to use the OIV tool in hadoop 1.0 as there is no
> hdfs.sh file there.
> Thanks.



-- 
Harsh J


Not able to place enough replicas

2014-07-14 Thread Bogdan Raducanu
I'm getting this error while writing many files.
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not
able to place enough replicas, still in need of 4 to reach 4

I've set logging to DEBUG but still there is no reason printed. There
should've been a reason after this line but instead there's just an empty
line.
Has anyone seen something like this before? It is seen on a 4 node cluster
running hadoop 2.2


org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.create: file /file_1002
for DFSClient_NONMAPREDUCE_839626346_1 at 192.168.180.1
org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile:
src=/file_1002, holder=DFSClient_NONMAPREDUCE_839626346_1,
clientMachine=192.168.180.1, createParent=true, replication=4,
createFlag=[CREATE, OVERWRITE]
org.apache.hadoop.hdfs.StateChange: DIR* addFile: /file_1002 is added
org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: add
/file_1002 to namespace for DFSClient_NONMAPREDUCE_839
<< ... many other operations ... >>
8 seconds later:
org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file
/file_1002 fileId=189252 for DFSClient_NONMAPREDUCE_839626346_1
org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock:
file /file_1002 for DFSClient_NONMAPREDUCE_839626346_1
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not
able to place enough replicas, still in need of 4 to reach 4
<< EMPTY LINE >>
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:test (auth:SIMPLE) cause:java.io.IOException: File /file_1002 could only
be replicated to 0 nodes instead of minReplication (=1).  There are 4
datanode(s) running and no node(s) are excluded in this operation.
org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
192.168.180.1:49592 Call#1321 Retry#0: error: java.io.IOException: File
/file_1002 could only be replicated to 0 nodes instead of minReplication
(=1).  There are 4 datanode(s) running and no node(s) are excluded in this
operation.
java.io.IOException: File /file_1002 could only be replicated to 0 nodes
instead of minReplication (=1).  There are 4 datanode(s) running and no
node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042):0


Re: changing split size in Hadoop configuration

2014-07-14 Thread Adam Kawa
It sounds like JobTracker setting, so the restart looks to be required.

You verify it in pseudo-distributed mode by setting it to a very low value, 
restarting JT and seeing if you get the exception that prints this new value.

Sent from my iPhone

> On 14 jul 2014, at 16:03, Jan Warchoł  wrote:
> 
> Hello,
> 
> I recently got "Split metadata size exceeded 1000" error when running 
> Cascading jobs with very big joins.  I found that I should change 
> mapreduce.jobtracker.split.metainfo.maxsize property in hadoop configuration 
> by adding this to the mapred-site.xml file:
> 
>   
> 
> mapreduce.jobtracker.split.metainfo.maxsize
> 10
>   
> 
> but it didn't seem to have any effect - I'm probably doing something wrong.
> 
> Where should I add this change so that is has the desired effect?  Do I 
> understand correctly that jobtracker restart is required after making the 
> change? The cluster I'm working on has Hadoop 1.0.4.
> 
> thanks for any help,
> -- 
> Jan Warchoł
> Software Engineer
> 
> -
> M: +48 509 078 203
> E: jan.warc...@codilime.com
> -
> CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland, 
> 01-167 Warsaw, ul. Zawiszy 14/97. Registered by The District Court for the 
> Capital City of Warsaw, XII Commercial Department of the National Court 
> Register. Entered into National Court Register under No. KRS 388871. Tax 
> identification number (NIP) 5272657478. Statistical number (REGON) 142974628.


OIV Compatiblity

2014-07-14 Thread Ashish Dobhal
Hey everyone ;
Could anyone tell me how to use the OIV tool in hadoop 1.0 as there is no
hdfs.sh file there.
Thanks.


changing split size in Hadoop configuration

2014-07-14 Thread Jan Warchoł
Hello,

I recently got "Split metadata size exceeded 1000" error when running
Cascading jobs with very big joins.  I found that I should change
mapreduce.jobtracker.split.metainfo.maxsize property in hadoop
configuration by adding this to the mapred-site.xml file:

  

mapreduce.jobtracker.split.metainfo.maxsize
10
  

but it didn't seem to have any effect - I'm probably doing something wrong.

Where should I add this change so that is has the desired effect?  Do I
understand correctly that jobtracker restart is required after making the
change? The cluster I'm working on has Hadoop 1.0.4.

thanks for any help,
-- 
*Jan Warchoł*
*Software Engineer*

-
M: +48 509 078 203
 E: jan.warc...@codilime.com
-
CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland,
01-167 Warsaw, ul. Zawiszy 14/97. Registered by The District Court for the
Capital City of Warsaw, XII Commercial Department of the National Court
Register. Entered into National Court Register under No. KRS 388871.
Tax identification number (NIP) 5272657478. Statistical number
(REGON) 142974628.


回复: Block should be additionally replicated on 1 more rack(s)

2014-07-14 Thread 风雨无阻
HI ,
  I didn't try Hadoop rebalancer 。Because I remember rebalancer only considers 
disk load, and won't consider that data blocks which rack 。
  I can try 。Thank you for your reply 。‍





-- 原始邮件 --
发件人: "Yehia Elshater";;
发送时间: 2014年7月14日(星期一) 下午4:52
收件人: "user"; 

主题: Re: Block should be additionally replicated on 1 more rack(s)



Hi,

Did you try Hadoop rebalancer ?


http://hadoop.apache.org/docs/r1.0.4/hdfs_user_guide.html#Rebalancer
 





On 14 July 2014 04:10, 风雨无阻 <232341...@qq.com> wrote:
 HI all:


After the cluster configuration rack awareness,run " hadoop fsck / " 
 A lot of the following error occurred:
 Replica placement policy is violated for blk_-1267324897180563985_11130670. 
Block should be additionally replicated on 1 more rack(s).


Online said "The reason is that three copies on the same rack" .
 The solution is now:
hadoop dfs -setrep 4  
/user/hive/warehouse/tbl_add_av_errorlog_android/dt=2013-08-24/04_0
sleep N
hadoop dfs -setrep 3 
/user/hive/warehouse/tbl_add_av_errorlog_android/dt=2013-08-24/04_0
 But the speed is very slow。‍


What is a better good way to make HDFS become healthy‍?‍
 


Thanks,‍

Ma Jian

Re: Block should be additionally replicated on 1 more rack(s)

2014-07-14 Thread Yehia Elshater
Hi,

Did you try Hadoop rebalancer ?

http://hadoop.apache.org/docs/r1.0.4/hdfs_user_guide.html#Rebalancer



On 14 July 2014 04:10, 风雨无阻 <232341...@qq.com> wrote:

> HI all:
>
> After the cluster configuration rack awareness,run " hadoop fsck / "
> A lot of the following error occurred:
>  Replica placement policy is violated for
> blk_-1267324897180563985_11130670. Block should be additionally replicated
> on 1 more rack(s).
>
> Online said "The reason is that three copies on the same rack" .
> The solution is now:
> hadoop dfs -setrep 4
>  /user/hive/warehouse/tbl_add_av_errorlog_android/dt=2013-08-24/04_0
> sleep N
> hadoop dfs -setrep 3
> /user/hive/warehouse/tbl_add_av_errorlog_android/dt=2013-08-24/04_0
> But the speed is very slow。‍
>
> What is a better good way to make HDFS become healthy‍?‍
>
> Thanks,‍
> Ma Jian
>


Block should be additionally replicated on 1 more rack(s)

2014-07-14 Thread 风雨无阻
HI all:


After the cluster configuration rack awareness,run " hadoop fsck / " 
A lot of the following error occurred:
 Replica placement policy is violated for blk_-1267324897180563985_11130670. 
Block should be additionally replicated on 1 more rack(s).


Online said "The reason is that three copies on the same rack" .
The solution is now:
hadoop dfs -setrep 4  
/user/hive/warehouse/tbl_add_av_errorlog_android/dt=2013-08-24/04_0
sleep N
hadoop dfs -setrep 3 
/user/hive/warehouse/tbl_add_av_errorlog_android/dt=2013-08-24/04_0
But the speed is very slow。‍


What is a better good way to make HDFS become healthy‍?‍



Thanks,‍

Ma Jian

Re: Hadoop(version 2.4.1) is a symbolic link support?

2014-07-14 Thread Akira AJISAKA

Hadoop 2.4.1 doesn't support symbolic link.

(2014/07/14 11:34), cho ju il wrote:

My hadoop version is 2.4.1.

Hdfs(version 2.4.1) is a symbolic link support?

How do I create symbolic links?