Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Panshul Whisper
Well in that case... Hbase is more of a data store than a database system.
Hbase does not have the efficient indexing , sorting and averaging
functions that a normal database has. It is good for random read and write
in the sense of huge amounts of data distributed among column families.
You can get a much more detailed answer on google.

Regards,
Panshul
On Feb 26, 2013 4:28 AM, "anil gupta"  wrote:

> Hadoop is not a database. So , why would you do comparison?
> HBase vs Traditional RDBMS might sound ok.
> On Feb 25, 2013 5:15 AM, "Oleg Ruchovets"  wrote:
>
>> Hi ,
>>Can you please share hadoop advantages vs Traditional Relational DB.
>> A link to short presentations or benchmarks would be great.
>>
>> Thanks.
>> Oleg.
>>
>


Re: Encryption in HDFS

2013-02-25 Thread Seonyeong Bak
Thank you so much for all your comments. :)


Re: MS sql server hadoop connector

2013-02-25 Thread Alexander Alten-Lorenz
Hi,

+ us...@sqoop.apache.org
- user@hadoop.apache.org

Hi,

I moved the thread to the sqoop mailing list.
The main error indicates whats going wrong:
> mssqoop-sqlserver: java.io.IOException: the content of connector file must be 
> in form of key=value

From sqoop 1.4.2 on we support NVARCHAR for im/export, but sqoop hasn't a valid 
splitter for this kind of datatype. We had such a thread in past, follow the 
instructions here: 
http://mail-archives.apache.org/mod_mbox/sqoop-user/201210.mbox/%3C20121026221855.GG12835@jarcec-thinkpad%3E

- Alex


On Feb 25, 2013, at 10:33 PM, Swapnil Shinde  wrote:

> Hello
> I am newly trying to work with SQL server hadoop connector. We have installed 
> sqoop and SQL server connector properly. but i m getting below error while 
> running import command.
> I am not sure how to proceed with this so any help will be really great..
> 
> 13/02/25 16:18:32 ERROR sqoop.ConnFactory: Error loading ManagerFactory 
> information from file 
> /opt/mapr/sqoop/sqoop-1.4.2/bin/../conf/managers.d/mssqoop-sqlserver: 
> java.io.IOException: the content of connector file must be in form of 
> key=value
>   at 
> org.apache.sqoop.ConnFactory.addManagersFromFile(ConnFactory.java:219)
>   at 
> org.apache.sqoop.ConnFactory.loadManagersFromConfDir(ConnFactory.java:294)
>   at 
> org.apache.sqoop.ConnFactory.instantiateFactories(ConnFactory.java:85)
>   at org.apache.sqoop.ConnFactory.(ConnFactory.java:62)
>   at com.cloudera.sqoop.ConnFactory.(ConnFactory.java:36)
>   at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:201)
>   at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:83)
>   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:464)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> 
> 13/02/25 16:18:32 INFO manager.SqlManager: Using default fetchSize of 1000
> 13/02/25 16:18:32 INFO tool.CodeGenTool: Beginning code generation
> Feb 25, 2013 4:18:32 PM com.microsoft.sqlserver.jdbc.SQLServerConnection 
> 
> SEVERE: Java Runtime Environment (JRE) version 1.6 is not supported by this 
> driver. Use the sqljdbc4.jar class library, which provides support for JDBC 
> 4.0.
> 13/02/25 16:18:32 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) 
> version 1.6 is not supported by this driver. Use the sqljdbc4.jar class 
> library, which provides support for JDBC 4.0.
> java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) 
> version 1.6 is not supported by this driver. Use the sqljdbc4.jar class 
> library, which provides support for JDBC 4.0.
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.(SQLServerConnection.java:238)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:841)
>   at java.sql.DriverManager.getConnection(DriverManager.java:582)
>   at java.sql.DriverManager.getConnection(DriverManager.java:207)
>   at 
> org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:663)
>   at 
> org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:525)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:548)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:191)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:175)
>   at 
> org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:262)
>   at 
> org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1235)
>   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1060)
>   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
>   at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:390)
>   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> 
> 
> 
> 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF



Re: Encryption in HDFS

2013-02-25 Thread Mathias Herberts
Encryption without proper key management only addresses the 'stolen
hard drive' problem.

So far I have not found 100% satisfactory solutions to this hard
problem. I've written OSS (Open Secret Server) partly to address this
problem in Pig, i.e. accessing encrypted data without embedding key
info into the job description file. Proper encrypted data handling
implies striict code review though, as in the case of Pig databags are
spillable and you could end up with unencrypted data stored on disk
without intent.

OSS http://github.com/hbs/oss and the Pig specific code:
https://github.com/hbs/oss/blob/master/src/main/java/com/geoxp/oss/pig/PigSecretStore.java

On Tue, Feb 26, 2013 at 6:33 AM, Seonyeong Bak  wrote:
> I didn't handle a key distribution problem because I thought that this
> problem is more difficult.
> I simply hardcode a key into the code.
>
> A challenge related to security are handled in HADOOP-9331, MAPREDUCE-5025,
> and so on.


Re:

2013-02-25 Thread Ling Kun
Hi, lahir:

you can start by setting  up a hadoop cluster, and look into the
bin/start-all.sh shell script.


Ling Kun


On Tue, Feb 26, 2013 at 12:28 PM, lahir marni  wrote:

> hi,
>
> when i open the source of hadoop-1.0.4 i can see many files. I donot
> understand which one to start. can you suggeest me a way to understand the
> source code for hadoop-1.0.4
>
> thanks,
> Lahir
>



-- 
http://www.lingcc.com


Re: Encryption in HDFS

2013-02-25 Thread Ted Dunning
Most recent crypto libraries use the special instructions on Intel
processors.

See for instance:
http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set


On Mon, Feb 25, 2013 at 9:10 PM, Seonyeong Bak  wrote:

> Hello, I'm a university student.
>
> I implemented AES and Triple DES with CompressionCodec in java
> cryptography architecture (JCA)
> The encryption is performed by a client node using Hadoop API.
> Map tasks read blocks from HDFS and these blocks are decrypted by each map
> tasks.
> I tested my implementation with generic HDFS.
> My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each
> machines have quad core processor (i7-2600) and 4GB memory.
> A test input is 1TB text file which consists of 32 multiple text files (1
> text file is 32GB)
>
> I expected that the encryption takes much more time than generic HDFS.
> The performance does not differ significantly.
> The decryption step takes about 5-7% more than generic HDFS.
> The encryption step takes about 20-30% more than generic HDFS because it
> is implemented by single thread and executed by 1 client node.
> So the encryption can get more performance.
>
> May there be any error in my test?
>
> I know there are several implementation for encryting files in HDFS.
> Are these implementations enough to secure HDFS?
>
> best regards,
>
> seonpark
>
> * Sorry for my bad english
>
>


[no subject]

2013-02-25 Thread lahir marni
hi,

when i open the source of hadoop-1.0.4 i can see many files. I donot
understand which one to start. can you suggeest me a way to understand the
source code for hadoop-1.0.4

thanks,
Lahir


Re: Encryption in HDFS

2013-02-25 Thread Seonyeong Bak
I didn't handle a key distribution problem because I thought that this
problem is more difficult.
I simply hardcode a key into the code.

A challenge related to security are handled in
HADOOP-9331,
MAPREDUCE-5025 , and
so on.


Re: Encryption in HDFS

2013-02-25 Thread Ted Yu
The following JIRAs are related to your research:

HADOOP-9331: Hadoop crypto codec framework and crypto codec implementations<
https://issues.apache.org/jira/browse/hadoop-9331> and related sub-tasks

MAPREDUCE-5025: Key Distribution and Management for supporting crypto codec
in Map Reduce and
related JIRAs

On Mon, Feb 25, 2013 at 9:10 PM, Seonyeong Bak  wrote:

> Hello, I'm a university student.
>
> I implemented AES and Triple DES with CompressionCodec in java
> cryptography architecture (JCA)
> The encryption is performed by a client node using Hadoop API.
> Map tasks read blocks from HDFS and these blocks are decrypted by each map
> tasks.
> I tested my implementation with generic HDFS.
> My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each
> machines have quad core processor (i7-2600) and 4GB memory.
> A test input is 1TB text file which consists of 32 multiple text files (1
> text file is 32GB)
>
> I expected that the encryption takes much more time than generic HDFS.
> The performance does not differ significantly.
> The decryption step takes about 5-7% more than generic HDFS.
> The encryption step takes about 20-30% more than generic HDFS because it
> is implemented by single thread and executed by 1 client node.
> So the encryption can get more performance.
>
> May there be any error in my test?
>
> I know there are several implementation for encryting files in HDFS.
> Are these implementations enough to secure HDFS?
>
> best regards,
>
> seonpark
>
> * Sorry for my bad english
>
>


Re: Encryption in HDFS

2013-02-25 Thread lohit
Another challenge of encrypt/decrypt is key management.
Can  you share how are this is handled in your implementation/research

2013/2/25 Seonyeong Bak 

> Hello, I'm a university student.
>
> I implemented AES and Triple DES with CompressionCodec in java
> cryptography architecture (JCA)
> The encryption is performed by a client node using Hadoop API.
> Map tasks read blocks from HDFS and these blocks are decrypted by each map
> tasks.
> I tested my implementation with generic HDFS.
> My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each
> machines have quad core processor (i7-2600) and 4GB memory.
> A test input is 1TB text file which consists of 32 multiple text files (1
> text file is 32GB)
>
> I expected that the encryption takes much more time than generic HDFS.
> The performance does not differ significantly.
> The decryption step takes about 5-7% more than generic HDFS.
> The encryption step takes about 20-30% more than generic HDFS because it
> is implemented by single thread and executed by 1 client node.
> So the encryption can get more performance.
>
> May there be any error in my test?
>
> I know there are several implementation for encryting files in HDFS.
> Are these implementations enough to secure HDFS?
>
> best regards,
>
> seonpark
>
> * Sorry for my bad english
>
>


-- 
Have a Nice Day!
Lohit


Encryption in HDFS

2013-02-25 Thread Seonyeong Bak
Hello, I'm a university student.

I implemented AES and Triple DES with CompressionCodec in java cryptography
architecture (JCA)
The encryption is performed by a client node using Hadoop API.
Map tasks read blocks from HDFS and these blocks are decrypted by each map
tasks.
I tested my implementation with generic HDFS.
My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each
machines have quad core processor (i7-2600) and 4GB memory.
A test input is 1TB text file which consists of 32 multiple text files (1
text file is 32GB)

I expected that the encryption takes much more time than generic HDFS.
The performance does not differ significantly.
The decryption step takes about 5-7% more than generic HDFS.
The encryption step takes about 20-30% more than generic HDFS because it is
implemented by single thread and executed by 1 client node.
So the encryption can get more performance.

May there be any error in my test?

I know there are several implementation for encryting files in HDFS.
Are these implementations enough to secure HDFS?

best regards,

seonpark

* Sorry for my bad english


Re: Hadoop efficient resource isolation

2013-02-25 Thread Marcin Mejran
That won't stop a bad job (say a fork bomb or a massive memory leak in a 
streaming script) from taking out a node which is what I believe Dhanasekaran 
was asking about. He wants to physically isolate certain lobs to certain "non 
critical" nodes. I don't believe this is possible and data would be spread to 
those nodes, assuming they're data nodes, which would still cause cluster wide 
issues (and if data is isolate why not have two separate clusters?),

I've read references in the docs about some type of memory based contrains in 
Hadoop but I don't know of the details. Anyone know how they work?

Also, I believe there are tools in Linux that can kill processes in case of 
memory issues and otherwise restrict what a certain user can do. These seem 
like a more flexible solution although they won't cover all potential issues.

-Marcin

On Feb 25, 2013, at 7:20 PM, "Arun C Murthy" 
mailto:a...@hortonworks.com>> wrote:

CapacityScheduler is what you want...

On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote:

Hi Guys,

It's possible isolation job submission for hadoop cluster, we currently running 
48 machine cluster. we  monitor Hadoop is not provides efficient resource 
isolation. In my case we ran for tech and research pool, When tech job some 
memory leak will haven, It's occupy the hole cluster.  Finally we figure out  
issue with tech job. It's  screwed up hole hadoop cluster. finally 10 data node 
 are dead.

Any prevention of job submission efficient way resource allocation. When 
something wrong in   particular job, effect particular pool, Not effect others 
job. Any way to archive this

Please guide me guys.

My idea is, When tech user submit job means only apply job in for my case 
submit 24 machine. other machine only for research user.

It's will prevent the memory leak problem.


-Dhanasekaran.
Did I learn something today? If not, I wasted it.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread anil gupta
Hadoop is not a database. So , why would you do comparison?
HBase vs Traditional RDBMS might sound ok.
On Feb 25, 2013 5:15 AM, "Oleg Ruchovets"  wrote:

> Hi ,
>Can you please share hadoop advantages vs Traditional Relational DB.
> A link to short presentations or benchmarks would be great.
>
> Thanks.
> Oleg.
>


Re: Datanodes shutdown and HBase's regionservers not working

2013-02-25 Thread Davey Yan
Hi Nicolas,

I think i found what led to shutdown of all of the datanodes, but i am
not completely certain.
I will return to this mail list when my cluster returns to be stable.

On Mon, Feb 25, 2013 at 8:01 PM, Nicolas Liochon  wrote:
> Network error messages are not always friendly, especially if there is a
> misconfiguration.
> This said,  "connection refused" says that the network connection was made,
> but that the remote port was not opened on the remote box. I.e. the process
> was dead.
> It could be useful to pastebin the whole logs as well...
>
>
> On Mon, Feb 25, 2013 at 12:44 PM, Davey Yan  wrote:
>>
>> But... there was no log like "network unreachable".
>>
>>
>> On Mon, Feb 25, 2013 at 6:07 PM, Nicolas Liochon 
>> wrote:
>> > I agree.
>> > Then for HDFS, ...
>> > The first thing to check is the network I would say.
>> >
>> >
>> >
>> >
>> > On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan  wrote:
>> >>
>> >> Thanks for reply, Nicolas.
>> >>
>> >> My question: What can lead to shutdown of all of the datanodes?
>> >> I believe that the regionservers will be OK if the HDFS is OK.
>> >>
>> >>
>> >> On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon 
>> >> wrote:
>> >> > Ok, what's your question?
>> >> > When you say the datanode went down, was it the datanode processes or
>> >> > the
>> >> > machines, with both the datanodes and the regionservers?
>> >> >
>> >> > The NameNode pings its datanodes every 3 seconds. However it will
>> >> > internally
>> >> > mark the datanodes as dead after 10:30 minutes (even if in the gui
>> >> > you
>> >> > have
>> >> > 'no answer for x minutes').
>> >> > HBase monitoring is done by ZooKeeper. By default, a regionserver is
>> >> > considered as dead after 180s with no answer. Before, well, it's
>> >> > considered
>> >> > as live.
>> >> > When you stop a regionserver, it tries to flush its data to the disk
>> >> > (i.e.
>> >> > hdfs, i.e. the datanodes). That's why if you have no datanodes, or if
>> >> > a
>> >> > high
>> >> > ratio of your datanodes are dead, it can't shutdown. Connection
>> >> > refused
>> >> > &
>> >> > socket timeouts come from the fact that before the 10:30 minutes hdfs
>> >> > does
>> >> > not declare the nodes as dead, so hbase tries to use them (and,
>> >> > obviously,
>> >> > fails). Note that there is now  an intermediate state for hdfs
>> >> > datanodes,
>> >> > called "stale": an intermediary state where the datanode is used only
>> >> > if
>> >> > you
>> >> > have to (i.e. it's the only datanode with a block replica you need).
>> >> > It
>> >> > will
>> >> > be documented in HBase for the 0.96 release. But if all your
>> >> > datanodes
>> >> > are
>> >> > down it won't change much.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Nicolas
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan 
>> >> > wrote:
>> >> >>
>> >> >> Hey guys,
>> >> >>
>> >> >> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than
>> >> >> 1
>> >> >> year, and it works fine.
>> >> >> But the datanodes got shutdown twice in the last month.
>> >> >>
>> >> >> When the datanodes got shutdown, all of them became "Dead Nodes" in
>> >> >> the NN web admin UI(http://ip:50070/dfshealth.jsp),
>> >> >> but regionservers of HBase were still live in the HBase web
>> >> >> admin(http://ip:60010/master-status), of course, they were zombies.
>> >> >> All of the processes of jvm were still running, including
>> >> >> hmaster/namenode/regionserver/datanode.
>> >> >>
>> >> >> When the datanodes got shutdown, the load (using the "top" command)
>> >> >> of
>> >> >> slaves became very high, more than 10, higher than normal running.
>> >> >> From the "top" command, we saw that the processes of datanode and
>> >> >> regionserver were comsuming CPU.
>> >> >>
>> >> >> We could not stop the HBase or Hadoop cluster through normal
>> >> >> commands(stop-*.sh/*-daemon.sh stop *).
>> >> >> So we stopped datanodes and regionservers by kill -9 PID, then the
>> >> >> load of slaves returned to normal level, and we start the cluster
>> >> >> again.
>> >> >>
>> >> >>
>> >> >> Log of NN at the shutdown point(All of the DNs were removed):
>> >> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology:
>> >> >> Removing a node: /default-rack/192.168.1.152:50010
>> >> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange:
>> >> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> >> >> 192.168.1.149:50010
>> >> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology:
>> >> >> Removing a node: /default-rack/192.168.1.149:50010
>> >> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange:
>> >> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> >> >> 192.168.1.150:50010
>> >> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology:
>> >> >> Removing a node: /default-rack/192.168.1.150:50010
>> >> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange:
>> >> >> BLOCK* NameSystem.heartbeatCheck:

Re: How to find Replication factor for one perticular folder in HDFS

2013-02-25 Thread YouPeng Yang
Hi Dhanasekaran Anbalagan

  1. To get the key from configuration :
/bin/hdfs -getconf -conKey dfs.replication


   2.Maybe you can add the attribute
   true to your dfs.replication :

   
dfs.replication
   2
   true



regards.



2013/2/26 Nitin Pawar 

> see if the link below helps you
>
>
> http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/
>
>
> On Mon, Feb 25, 2013 at 10:36 PM, Dhanasekaran Anbalagan <
> bugcy...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> How to query particular folder witch replication factor configured. In my
>> cluster some folder in HDFS configured 2 and some of them configured as
>> three. How to query.
>>
>> please guide me
>>
>> -Dhanasekaran
>>
>> Did I learn something today? If not, I wasted it.
>>
>
>
>
> --
> Nitin Pawar
>


Re: Hadoop efficient resource isolation

2013-02-25 Thread Arun C Murthy
CapacityScheduler is what you want...

On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote:

> Hi Guys,
> 
> It's possible isolation job submission for hadoop cluster, we currently 
> running 48 machine cluster. we  monitor Hadoop is not provides efficient 
> resource isolation. In my case we ran for tech and research pool, When tech 
> job some memory leak will haven, It's occupy the hole cluster.  Finally we 
> figure out  issue with tech job. It's  screwed up hole hadoop cluster. 
> finally 10 data node  are dead.
> 
> Any prevention of job submission efficient way resource allocation. When 
> something wrong in   particular job, effect particular pool, Not effect 
> others job. Any way to archive this
> 
> Please guide me guys.
> 
> My idea is, When tech user submit job means only apply job in for my case 
> submit 24 machine. other machine only for research user.
> 
> It's will prevent the memory leak problem. 
>  
> 
> -Dhanasekaran.
> Did I learn something today? If not, I wasted it.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: Getting Giraph to build/run with CDH4 (cannot initialize cluster - check mapreduce.framework.name)

2013-02-25 Thread Arun C Murthy
Pls ask CDH lists.

On Feb 25, 2013, at 8:26 AM, David Boyd wrote:

> All:
>   I am trying to get the Giraph 0.2 snapshot (pulled via GIT on Friday)
> to build and run with CDH4.
> 
> I modified the pom.xml to provide a profile for my specific version (4.1.1).
> The build works (mvn -Phadoop_cdh4.1.1 clean package test) and passes
> all the tests.
> 
> If I try to do the next step and submit to my cluster with the command:
> mvn -Phadoop_cdh4.1.1 test -Dprop.mapred.job.tracker=10.1.94.53:8021 
> -Dgiraph.zkList=10.1.94.104:2181
> 
> the JSON test in core fails.  If I move that test out of the way a whole 
> bunch of tests in examples
> fail.  They all fail with:
>> java.io.IOException: Cannot initialize Cluster. Please check your
>> configuration for mapreduce.framework.name and the correspond server
>> addresses.
> 
> I have tried passing mapreduce.framework.name as both local and classic.   I 
> have also set those values in my mapreduce-site.xml.
> 
> Interestingly I can run the pagerank benchmark in code with the command:
>> hadoop jar
>> ./giraph-core/target/giraph-0.2-SNAPSHOT-for-hadoop-2.0.0-cdh4.1.3-jar-with-dependencies.jar
>> org.apache.giraph.benchmark.PageRankBenchmark
>> -Dmapred.child.java-opts="-Xmx64g -Xms64g XX:+UseConcMarkSweepGC
>> -XX:-UseGCOverheadLimit" -Dgiraph.zkList=10.1.94.104:2181 -e 1 -s 3 -v
>> -V 5 -w 83
> And it completes just fine.
> 
> I have searched high and low for documents and examples on how to run the 
> example programs from other
> than maven but have not found any thing.
> 
> Any help or suggestions  would be greatly appreciated.
> 
> THanks.
> 
> 
> 
> -- 
> = mailto:db...@data-tactics.com 
> David W. Boyd
> Director, Engineering, Research and Development
> Data Tactics Corporation
> 7901 Jones Branch, Suite 240
> Mclean, VA 22102
> office:   +1-703-506-3735, ext 308
> fax: +1-703-506-6703
> cell: +1-703-402-7908
> == http://www.data-tactics.com/ 
> 
> The information contained in this message may be privileged
> and/or confidential and protected from disclosure.
> If the reader of this message is not the intended recipient
> or an employee or agent responsible for delivering this message
> to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication
> is strictly prohibited.  If you have received this communication
> in error, please notify the sender immediately by replying to
> this message and deleting the material from any computer.
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)

2013-02-25 Thread Arun C Murthy
Pls don't cross-post, belongs only to CDH lists.

On Feb 25, 2013, at 2:24 AM, samir das mohapatra wrote:

> I am using CDH4.1.2 with MRv1 not YARN.
> 
> 
> On Mon, Feb 25, 2013 at 3:47 PM, samir das mohapatra 
>  wrote:
> yes
> 
> 
> On Mon, Feb 25, 2013 at 3:30 PM, Nitin Pawar  wrote:
> does this match with your issue
> 
> https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kIPOvrFaQE8
> 
> 
> On Mon, Feb 25, 2013 at 3:20 PM, samir das mohapatra 
>  wrote:
> 
> 
> -- Forwarded message --
> From: samir das mohapatra 
> Date: Mon, Feb 25, 2013 at 3:05 PM
> Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS 
> clusters.(using distch)
> To: cdh-u...@cloudera.org
> 
> 
> Hi All,
>   I am getting bellow error , can any one help me on the same issue,
> 
> ERROR LOG:
> --
> 
> hadoop@hadoophost2:~$ hadoop   distcp 
> hdfs://10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input
> 13/02/25 01:34:36 INFO tools.DistCp: 
> srcPaths=[hdfs://10.192.200.170:50070/tmp/samir.txt]
> 13/02/25 01:34:36 INFO tools.DistCp: 
> destPath=hdfs://10.192.244.237:50070/input
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: Failed on local exception: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> end-group tag did not match expected tag.; Host Details : local host is: 
> "hadoophost2/10.192.244.237"; destination host is: 
> "bl1slu040.corp.adobe.com":50070; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
> at org.apache.hadoop.ipc.Client.call(Client.java:1164)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy9.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257)
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message end-group tag did not match expected tag.
> at 
> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73)
> at 
> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
> at 
> org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
> 
> 
> 
> Regards,
> samir
> 
> 
> 
> 
> -- 
> Nitin Pawar
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: Format the harddrive

2013-02-25 Thread Jeffrey Buell
What make and model are these machines? What is the storage controller? You may 
need to go into the storage configuration tool during hardware boot and look at 
how the controller has configured the disks. Maybe they need to be activated. 
Or delete all the "virtual disks" and start over. On HP machines you can use 
the command-line tool "hpacucli" to do this. Other vendors probably have 
similar tools. 

- Original Message -

From: "Rahul1 Shah"  
To: user@hadoop.apache.org 
Sent: Monday, February 25, 2013 2:27:39 PM 
Subject: RE: Format the harddrive 



I used gparted and wiped of every partition disks had. But as soon as I put the 
OS back it gives me the warning “Disks sdc, sde, sdf, sdi, sdk contain BIOS 
RAID metadata but are not part of any recognized BIOS RAID sets/ Ignoring disks 
sdc, sde, sdf, sdi, sdk. It shows me the RAID strips to load OS into and I cant 
remove them after I load OS since it contains the boot. 

Right now trying the dd command over the entire disk to wipe of any data. Hope 
it works. If you have any better solution pls let me know. Thanks 



From: Michael Namaiandeh [mailto:mnamaian...@healthcit.com] 
Sent: Monday, February 25, 2013 3:11 PM 
To: user@hadoop.apache.org 
Subject: RE: Format the harddrive 

Are your partitions LVM or something else? If it’s not LVM then you can use 
GParted to re-configure your LV configuration. 





From: Jeffrey Buell [ mailto:jbu...@vmware.com ] 
Sent: Monday, February 25, 2013 4:10 PM 
To: user@hadoop.apache.org 
Subject: Re: Format the harddrive 


I've installed RHEL 6.1 with none of these problems. Not sure why you can't 
delete the logical volumes, but I suggest not letting the installer do 
automatic disk configuration. Then you can manually select which disks you want 
for the root partitions, their sizes, and how you want them formatted. 
Configure other disks for Hadoop data later, after OS install. 

Jeff 



From: "Rahul1 Shah" < rahul1.s...@intel.com > 
To: user@hadoop.apache.org 
Sent: Monday, February 25, 2013 12:33:17 PM 
Subject: Format the harddrive 
Hi, 

I am installing Hadoop on some systems. For this I want to format the hard 
drive for any previous RAID format or any kind of data. I am facing this 
problem that when I install Redhat 6.2 on these systems it creates a logical 
volume on the disk and does not let me create ext4 partitions on them. Any idea 
how do I format all the disk. 

-Rahul 




RE: Format the harddrive

2013-02-25 Thread Shah, Rahul1
I used gparted and wiped of every partition disks had. But as soon as I put the 
OS back it gives me the warning “Disks sdc, sde, sdf, sdi, sdk contain BIOS 
RAID metadata but are not part of any recognized BIOS RAID sets/ Ignoring disks 
sdc, sde, sdf, sdi, sdk. It shows me the RAID strips to load OS into and I cant 
remove them after I load OS since it contains the boot.

Right now trying the dd command over the entire disk to wipe of any data. Hope 
it works. If you have any better solution pls let me know. Thanks

From: Michael Namaiandeh [mailto:mnamaian...@healthcit.com]
Sent: Monday, February 25, 2013 3:11 PM
To: user@hadoop.apache.org
Subject: RE: Format the harddrive

Are your partitions LVM or something else? If it’s not LVM then you can use 
GParted to re-configure your LV configuration.



From: Jeffrey Buell [mailto:jbu...@vmware.com]
Sent: Monday, February 25, 2013 4:10 PM
To: user@hadoop.apache.org
Subject: Re: Format the harddrive

I've installed RHEL 6.1 with none of these problems. Not sure why you can't 
delete the logical volumes, but I suggest not letting the installer do 
automatic disk configuration.  Then you can manually select which disks you 
want for the root partitions, their sizes, and how you want them formatted.  
Configure other disks for Hadoop data later, after OS install.

Jeff

From: "Rahul1 Shah" mailto:rahul1.s...@intel.com>>
To: user@hadoop.apache.org
Sent: Monday, February 25, 2013 12:33:17 PM
Subject: Format the harddrive
Hi,

I am installing Hadoop on some systems. For this I want to format the hard 
drive for any previous RAID format or any kind of data. I am facing this 
problem that when I install Redhat 6.2 on these systems it creates a logical 
volume on the disk and does not let me create ext4 partitions on them. Any idea 
how do I format all the disk.

-Rahul




RE: Format the harddrive

2013-02-25 Thread Michael Namaiandeh
Are your partitions LVM or something else? If it’s not LVM then you can use 
GParted to re-configure your LV configuration.



From: Jeffrey Buell [mailto:jbu...@vmware.com]
Sent: Monday, February 25, 2013 4:10 PM
To: user@hadoop.apache.org
Subject: Re: Format the harddrive

I've installed RHEL 6.1 with none of these problems. Not sure why you can't 
delete the logical volumes, but I suggest not letting the installer do 
automatic disk configuration.  Then you can manually select which disks you 
want for the root partitions, their sizes, and how you want them formatted.  
Configure other disks for Hadoop data later, after OS install.

Jeff

From: "Rahul1 Shah" mailto:rahul1.s...@intel.com>>
To: user@hadoop.apache.org
Sent: Monday, February 25, 2013 12:33:17 PM
Subject: Format the harddrive
Hi,

I am installing Hadoop on some systems. For this I want to format the hard 
drive for any previous RAID format or any kind of data. I am facing this 
problem that when I install Redhat 6.2 on these systems it creates a logical 
volume on the disk and does not let me create ext4 partitions on them. Any idea 
how do I format all the disk.

-Rahul




Re: map reduce and sync

2013-02-25 Thread Lucas Bernardi
It looks like getSplits in FileInputFormat is ignoring 0 lenght files
That also would explain the weird behavior of tail, which seems to always
jump to the start since file length is 0.

So, basically, sync doesn't update file length, any code based on file
size, is unreliable.

Am I right?

How can I get around this?

Lucas

On Mon, Feb 25, 2013 at 12:38 PM, Lucas Bernardi  wrote:

> I didn't notice, thanks for the heads up.
>
>
> On Mon, Feb 25, 2013 at 4:31 AM, Harsh J  wrote:
>
>> Just an aside (I've not tried to look at the original issue yet), but
>> Scribe has not been maintained (nor has seen a release) in over a year
>> now -- looking at the commit history. Same case with both Facebook and
>> Twitter's fork.
>>
>> On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi  wrote:
>> > Yeah I looked at scribe, looks good but sounds like too much for my
>> problem.
>> > I'd rather make it work the simple way. Could you pleas post your code,
>> may
>> > be I'm doing something wrong on the sync side. Maybe a buffer size,
>> block
>> > size or some other  parameter is different...
>> >
>> > Thanks!
>> > Lucas
>> >
>> >
>> > On Sun, Feb 24, 2013 at 10:31 PM, Hemanth Yamijala
>> >  wrote:
>> >>
>> >> I am using the same version of Hadoop as you.
>> >>
>> >> Can you look at something like Scribe, which AFAIK fits the use case
>> you
>> >> describe.
>> >>
>> >> Thanks
>> >> Hemanth
>> >>
>> >>
>> >> On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi 
>> wrote:
>> >>>
>> >>> That is exactly what I did, but in my case, it is like if the file
>> were
>> >>> empty, the job counters say no bytes read.
>> >>> I'm using hadoop 1.0.3 which version did you try?
>> >>>
>> >>> What I'm trying to do is just some basic analyitics on a product
>> search
>> >>> system. There is a search service, every time a user performs a
>> search, the
>> >>> search string, and the results are stored in this file, and the file
>> is
>> >>> sync'ed. I'm actually using pig to do some basic counts, it doesn't
>> work,
>> >>> like I described, because the file looks empty for the map reduce
>> >>> components. I thought it was about pig, but I wasn't sure, so I tried
>> a
>> >>> simple mr job, and used the word count to test the map reduce
>> compoinents
>> >>> actually see the sync'ed bytes.
>> >>>
>> >>> Of course if I close the file, everything works perfectly, but I don't
>> >>> want to close the file every while, since that means I should create
>> another
>> >>> one (since no append support), and that would end up with too many
>> tiny
>> >>> files, something we know is bad for mr performance, and I don't want
>> to add
>> >>> more parts to this (like a file merging tool). I think unign sync is
>> a clean
>> >>> solution, since we don't care about writing performance, so I'd
>> rather keep
>> >>> it like this if I can make it work.
>> >>>
>> >>> Any idea besides hadoop version?
>> >>>
>> >>> Thanks!
>> >>>
>> >>> Lucas
>> >>>
>> >>>
>> >>>
>> >>> On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala
>> >>>  wrote:
>> 
>>  Hi Lucas,
>> 
>>  I tried something like this but got different results.
>> 
>>  I wrote code that opened a file on HDFS, wrote a line and called
>> sync.
>>  Without closing the file, I ran a wordcount with that file as input.
>> It did
>>  work fine and was able to count the words that were sync'ed (even
>> though the
>>  file length seems to come as 0 like you noted in fs -ls)
>> 
>>  So, not sure what's happening in your case. In the MR job, do the job
>>  counters indicate no bytes were read ?
>> 
>>  On a different note though, if you can describe a little more what
>> you
>>  are trying to accomplish, we could probably work a better solution.
>> 
>>  Thanks
>>  hemanth
>> 
>> 
>>  On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi 
>>  wrote:
>> >
>> > Helo Hemanth, thanks for answering.
>> > The file is open by a separate process not map reduce related at
>> all.
>> > You can think of it as a servlet, receiving requests, and writing
>> them to
>> > this file, every time a request is received it is written and
>> > org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
>> >
>> > At the same time, I want to run a map reduce job over this file.
>> Simply
>> > runing the word count example doesn't seem to work, it is like if
>> the file
>> > were empty.
>> >
>> > hadoop -fs -tail works just fine, and reading the file using
>> > org.apache.hadoop.fs.FSDataInputStream also works ok.
>> >
>> > Last thing, the web interface doesn't see the contents, and command
>> > hadoop -fs -ls says the file is empty.
>> >
>> > What am I doing wrong?
>> >
>> > Thanks!
>> >
>> > Lucas
>> >
>> >
>> >
>> > On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala
>> >  wrote:
>> >>
>> >> Could you please clarify, are you opening the file in your mapper

Re: Hadoop efficient resource isolation

2013-02-25 Thread Jeffrey Buell
This is one reason to consider virtualizing Hadoop clusters. The idea is to 
create multiple virtual clusters on a single physical cluster and apply various 
kinds of resource controls (CPU, memory, I/O) on the virtual machines that make 
up each virtual cluster. Then if any application or VM within a virtual cluster 
crashes, hangs, or tries to hog resources, the other virtual clusters will be 
unaffected. Multi-tenancy is also enabled since the isolation between virtual 
clusters is secure. 

Jeff 

- Original Message -

From: "Hemanth Yamijala"  
To: user@hadoop.apache.org 
Sent: Thursday, February 21, 2013 8:51:04 AM 
Subject: Re: Hadoop efficient resource isolation 

Supporting a multiuser scenario like this is always hard under Hadoop. There 
are a few configuration knobs that offer some administrative control and 
protection. 

Specifically for the problem you describe, you could probably set 
Mapreduce.{map|reduce}.child.ulimit on the tasktrackers, so that any job that 
is exceeding these limits will be killed. Of course, a side effect of this 
would be that jobs would be bound by some limits even if they legitimately 
require more memory. 


But you could try starting with this. 


Thanks 
Hemanth 

On Thursday, February 21, 2013, Dhanasekaran Anbalagan wrote: 



Hi Guys, 


It's possible isolation job submission for hadoop cluster, we currently running 
48 machine cluster. we monitor Hadoop is not provides efficient resource 
isolation. In my case we ran for tech and research pool, When tech job some 
memory leak will haven, It's occupy the hole cluster. Finally we figure out 
issue with tech job. It's screwed up hole hadoop cluster. finally 10 data node 
are dead. 


Any prevention of job submission efficient way resource allocation. When 
something wrong in particular job, effect particular pool, Not effect others 
job. Any way to archive this 


Please guide me guys. 


My idea is, When tech user submit job means only apply job in for my case 
submit 24 machine. other machine only for research user. 

It's will prevent the memory leak problem. 




-Dhanasekaran. 

Did I learn something today? If not, I wasted it. 




Re: Format the harddrive

2013-02-25 Thread Jeffrey Buell
I've installed RHEL 6.1 with none of these problems. Not sure why you can't 
delete the logical volumes, but I suggest not letting the installer do 
automatic disk configuration. Then you can manually select which disks you want 
for the root partitions, their sizes, and how you want them formatted. 
Configure other disks for Hadoop data later, after OS install. 

Jeff 

- Original Message -

From: "Rahul1 Shah"  
To: user@hadoop.apache.org 
Sent: Monday, February 25, 2013 12:33:17 PM 
Subject: Format the harddrive 



Hi, 

I am installing Hadoop on some systems. For this I want to format the hard 
drive for any previous RAID format or any kind of data. I am facing this 
problem that when I install Redhat 6.2 on these systems it creates a logical 
volume on the disk and does not let me create ext4 partitions on them. Any idea 
how do I format all the disk. 

-Rahul 



Format the harddrive

2013-02-25 Thread Shah, Rahul1
Hi,

I am installing Hadoop on some systems. For this I want to format the hard 
drive for any previous RAID format or any kind of data. I am facing this 
problem that when I install Redhat 6.2 on these systems it creates a logical 
volume on the disk and does not let me create ext4 partitions on them. Any idea 
how do I format all the disk.

-Rahul



Re: How to find Replication factor for one perticular folder in HDFS

2013-02-25 Thread Nitin Pawar
see if the link below helps you

http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/


On Mon, Feb 25, 2013 at 10:36 PM, Dhanasekaran Anbalagan  wrote:

> Hi Guys,
>
> How to query particular folder witch replication factor configured. In my
> cluster some folder in HDFS configured 2 and some of them configured as
> three. How to query.
>
> please guide me
>
> -Dhanasekaran
>
> Did I learn something today? If not, I wasted it.
>



-- 
Nitin Pawar


Getting Giraph to build/run with CDH4 (cannot initialize cluster - check mapreduce.framework.name)

2013-02-25 Thread David Boyd

All:
   I am trying to get the Giraph 0.2 snapshot (pulled via GIT on Friday)
to build and run with CDH4.

I modified the pom.xml to provide a profile for my specific version (4.1.1).
The build works (mvn -Phadoop_cdh4.1.1 clean package test) and passes
all the tests.

If I try to do the next step and submit to my cluster with the command:
mvn -Phadoop_cdh4.1.1 test -Dprop.mapred.job.tracker=10.1.94.53:8021 
-Dgiraph.zkList=10.1.94.104:2181


 the JSON test in core fails.  If I move that test out of the way a 
whole bunch of tests in examples

fail.  They all fail with:

java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.


I have tried passing mapreduce.framework.name as both local and classic. 
  I have also set those values in my mapreduce-site.xml.


Interestingly I can run the pagerank benchmark in code with the command:

hadoop jar
./giraph-core/target/giraph-0.2-SNAPSHOT-for-hadoop-2.0.0-cdh4.1.3-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark
-Dmapred.child.java-opts="-Xmx64g -Xms64g XX:+UseConcMarkSweepGC
-XX:-UseGCOverheadLimit" -Dgiraph.zkList=10.1.94.104:2181 -e 1 -s 3 -v
-V 5 -w 83

And it completes just fine.

I have searched high and low for documents and examples on how to run 
the example programs from other

than maven but have not found any thing.

Any help or suggestions  would be greatly appreciated.

THanks.



--
= mailto:db...@data-tactics.com 
David W. Boyd
Director, Engineering, Research and Development
Data Tactics Corporation
7901 Jones Branch, Suite 240
Mclean, VA 22102
office:   +1-703-506-3735, ext 308
fax: +1-703-506-6703
cell: +1-703-402-7908
== http://www.data-tactics.com/ 

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.




Re: map reduce and sync

2013-02-25 Thread Lucas Bernardi
I didn't notice, thanks for the heads up.

On Mon, Feb 25, 2013 at 4:31 AM, Harsh J  wrote:

> Just an aside (I've not tried to look at the original issue yet), but
> Scribe has not been maintained (nor has seen a release) in over a year
> now -- looking at the commit history. Same case with both Facebook and
> Twitter's fork.
>
> On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi  wrote:
> > Yeah I looked at scribe, looks good but sounds like too much for my
> problem.
> > I'd rather make it work the simple way. Could you pleas post your code,
> may
> > be I'm doing something wrong on the sync side. Maybe a buffer size, block
> > size or some other  parameter is different...
> >
> > Thanks!
> > Lucas
> >
> >
> > On Sun, Feb 24, 2013 at 10:31 PM, Hemanth Yamijala
> >  wrote:
> >>
> >> I am using the same version of Hadoop as you.
> >>
> >> Can you look at something like Scribe, which AFAIK fits the use case you
> >> describe.
> >>
> >> Thanks
> >> Hemanth
> >>
> >>
> >> On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi 
> wrote:
> >>>
> >>> That is exactly what I did, but in my case, it is like if the file were
> >>> empty, the job counters say no bytes read.
> >>> I'm using hadoop 1.0.3 which version did you try?
> >>>
> >>> What I'm trying to do is just some basic analyitics on a product search
> >>> system. There is a search service, every time a user performs a
> search, the
> >>> search string, and the results are stored in this file, and the file is
> >>> sync'ed. I'm actually using pig to do some basic counts, it doesn't
> work,
> >>> like I described, because the file looks empty for the map reduce
> >>> components. I thought it was about pig, but I wasn't sure, so I tried a
> >>> simple mr job, and used the word count to test the map reduce
> compoinents
> >>> actually see the sync'ed bytes.
> >>>
> >>> Of course if I close the file, everything works perfectly, but I don't
> >>> want to close the file every while, since that means I should create
> another
> >>> one (since no append support), and that would end up with too many tiny
> >>> files, something we know is bad for mr performance, and I don't want
> to add
> >>> more parts to this (like a file merging tool). I think unign sync is a
> clean
> >>> solution, since we don't care about writing performance, so I'd rather
> keep
> >>> it like this if I can make it work.
> >>>
> >>> Any idea besides hadoop version?
> >>>
> >>> Thanks!
> >>>
> >>> Lucas
> >>>
> >>>
> >>>
> >>> On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala
> >>>  wrote:
> 
>  Hi Lucas,
> 
>  I tried something like this but got different results.
> 
>  I wrote code that opened a file on HDFS, wrote a line and called sync.
>  Without closing the file, I ran a wordcount with that file as input.
> It did
>  work fine and was able to count the words that were sync'ed (even
> though the
>  file length seems to come as 0 like you noted in fs -ls)
> 
>  So, not sure what's happening in your case. In the MR job, do the job
>  counters indicate no bytes were read ?
> 
>  On a different note though, if you can describe a little more what you
>  are trying to accomplish, we could probably work a better solution.
> 
>  Thanks
>  hemanth
> 
> 
>  On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi 
>  wrote:
> >
> > Helo Hemanth, thanks for answering.
> > The file is open by a separate process not map reduce related at all.
> > You can think of it as a servlet, receiving requests, and writing
> them to
> > this file, every time a request is received it is written and
> > org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
> >
> > At the same time, I want to run a map reduce job over this file.
> Simply
> > runing the word count example doesn't seem to work, it is like if
> the file
> > were empty.
> >
> > hadoop -fs -tail works just fine, and reading the file using
> > org.apache.hadoop.fs.FSDataInputStream also works ok.
> >
> > Last thing, the web interface doesn't see the contents, and command
> > hadoop -fs -ls says the file is empty.
> >
> > What am I doing wrong?
> >
> > Thanks!
> >
> > Lucas
> >
> >
> >
> > On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala
> >  wrote:
> >>
> >> Could you please clarify, are you opening the file in your mapper
> code
> >> and reading from there ?
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Friday, February 22, 2013, Lucas Bernardi wrote:
> >>>
> >>> Hello there, I'm trying to use hadoop map reduce to process an open
> >>> file. The writing process, writes a line to the file and syncs the
> file to
> >>> readers.
> >>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
> >>>
> >>> If I try to read the file from another process, it works fine, at
> >>> least using
> >>> org.apache.hadoop.fs.FSDataI

Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Panshul Whisper
Because Hadoop is not a DBMS.
On Feb 25, 2013 3:16 PM, "Oleg Ruchovets"  wrote:

> Why not ?
>
>
> On Mon, Feb 25, 2013 at 3:57 PM, dwivedishashwat <
> dwivedishash...@gmail.com> wrote:
>
>> Better not to compare
>>
>>
>>
>>
>> Sent from Samsung Galaxy Note
>>
>> Oleg Ruchovets  wrote:
>> Yes , Sure I ased uncle google first :-). I already saw these links , but
>> it is a difference between hadoop ans traditional DB.  I am looking for
>> hadoop advantages vs traditional DB.
>>
>> Thanks
>> Oleg.
>>
>>
>> On Mon, Feb 25, 2013 at 3:31 PM, Jean-Marc Spaggiari <
>> jean-m...@spaggiari.org> wrote:
>>
>>> Hi Oleg,
>>>
>>> Have you asked google first?
>>>
>>> http://indico.cern.ch/conferenceDisplay.py?confId=162202
>>> http://www.wikidifference.com/difference-between-hadoop-and-rdbms/
>>> http://iablog.sybase.com/paulley/2008/11/hadoop-vs-relational-databases/
>>>
>>> JM
>>>
>>>
>>> 2013/2/25 Oleg Ruchovets 
>>>
 Hi ,
Can you please share hadoop advantages vs Traditional Relational DB.
 A link to short presentations or benchmarks would be great.

 Thanks.
 Oleg.

>>>
>>>
>>>
>>
>


Re: How do _you_ document your hadoop jobs?

2013-02-25 Thread Jay Vyas
Wow that's very heavy weight and difficult to modify. Why not graphviz or 
generating the diagrams from some Or text format.?



On Feb 25, 2013, at 4:11 AM, "David Parks"  wrote:

> We’ve taken to documenting our Hadoop jobs in a simple visual manner using 
> PPT (attached example). I wonder how others document their jobs?
>  
> We often add notes to the text section of the PPT slides as well.
>  
> 


Re: How to start Data Replicated Blocks in HDFS manually.

2013-02-25 Thread Nitin Pawar
as shashwat suggested  run the command

$HADOOP_HOME/bin/hadoop dfs -setrep -R -w 2 /


On Mon, Feb 25, 2013 at 8:04 PM, Dhanasekaran Anbalagan
wrote:

> HI Nitin
>
> >>did you start the cluster with replication factor 3 and later changed it
> to 2
> yes correct. We have only two datanode, First  I wrongly configured after
> I changed  to 2.
> and i have done with hadoop fs -setrep -R 2 /myfolder
>
> same thing i done in all folder in HDFS
> >>did you enable rack awareness
> no.
>
> Did I learn something today? If not, I wasted it.
>
>
> On Mon, Feb 25, 2013 at 9:20 AM, Nitin Pawar wrote:
>
>> did you start the cluster with replication factor 3 and later changed it
>> to 2?
>> also did you enable rack awareness in your configs and both the nodes are
>> on same rack?
>>
>>
>>
>>
>> On Mon, Feb 25, 2013 at 7:45 PM, Dhanasekaran Anbalagan <
>> bugcy...@gmail.com> wrote:
>>
>>> Hi Guys,
>>>
>>> We have cluster with two data nodes. We configured data replication
>>> factor two.
>>> when  i copy data  to hdfs, Data's are not fully replicated. It's says * 
>>> Number
>>> of Under-Replicated Blocks : 15115*
>>> How to manually invoke the Data replication in HDFS.
>>>
>>> I restarted cluster also. It's not helps me
>>>
>>> Please guide me guys.
>>>
>>> -Dhanasekaran.
>>>
>>> Did I learn something today? If not, I wasted it.
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar


Re: How to start Data Replicated Blocks in HDFS manually.

2013-02-25 Thread Dhanasekaran Anbalagan
HI Nitin

>>did you start the cluster with replication factor 3 and later changed it
to 2
yes correct. We have only two datanode, First  I wrongly configured after I
changed  to 2.
and i have done with hadoop fs -setrep -R 2 /myfolder

same thing i done in all folder in HDFS
>>did you enable rack awareness
no.

Did I learn something today? If not, I wasted it.


On Mon, Feb 25, 2013 at 9:20 AM, Nitin Pawar wrote:

> did you start the cluster with replication factor 3 and later changed it
> to 2?
> also did you enable rack awareness in your configs and both the nodes are
> on same rack?
>
>
>
>
> On Mon, Feb 25, 2013 at 7:45 PM, Dhanasekaran Anbalagan <
> bugcy...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have cluster with two data nodes. We configured data replication
>> factor two.
>> when  i copy data  to hdfs, Data's are not fully replicated. It's says * 
>> Number
>> of Under-Replicated Blocks : 15115*
>> How to manually invoke the Data replication in HDFS.
>>
>> I restarted cluster also. It's not helps me
>>
>> Please guide me guys.
>>
>> -Dhanasekaran.
>>
>> Did I learn something today? If not, I wasted it.
>>
>
>
>
> --
> Nitin Pawar
>


Re: How to start Data Replicated Blocks in HDFS manually.

2013-02-25 Thread shashwat shriparv
The problem may be in default replication factor, which is 3, so first
check in hdfs-site.xml for replication factor is specified or not. if it
not add that parameter and restart the cluster   ---> first option

2nd Option : change the replicatio factor of the root directory of hdfs to
2 using following command

bin/hadoop dfs -setrep -R -w 2 /

this will chage the replication factor 2 two.

this problem may be also because you have two datanodes and replication
factor is 3. so you can think of the senario when you have two bucket and
you have 3 objects to keep.




∞
Shashwat Shriparv



On Mon, Feb 25, 2013 at 7:50 PM, Nitin Pawar wrote:

> did you start the cluster with replication factor 3 and later changed it
> to 2?
> also did you enable rack awareness in your configs and both the nodes are
> on same rack?
>
>
>
>
> On Mon, Feb 25, 2013 at 7:45 PM, Dhanasekaran Anbalagan <
> bugcy...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have cluster with two data nodes. We configured data replication
>> factor two.
>> when  i copy data  to hdfs, Data's are not fully replicated. It's says * 
>> Number
>> of Under-Replicated Blocks : 15115*
>> How to manually invoke the Data replication in HDFS.
>>
>> I restarted cluster also. It's not helps me
>>
>> Please guide me guys.
>>
>> -Dhanasekaran.
>>
>> Did I learn something today? If not, I wasted it.
>>
>
>
>
> --
> Nitin Pawar
>


Re: How to start Data Replicated Blocks in HDFS manually.

2013-02-25 Thread Nitin Pawar
did you start the cluster with replication factor 3 and later changed it to
2?
also did you enable rack awareness in your configs and both the nodes are
on same rack?




On Mon, Feb 25, 2013 at 7:45 PM, Dhanasekaran Anbalagan
wrote:

> Hi Guys,
>
> We have cluster with two data nodes. We configured data replication factor
> two.
> when  i copy data  to hdfs, Data's are not fully replicated. It's says * 
> Number
> of Under-Replicated Blocks : 15115*
> How to manually invoke the Data replication in HDFS.
>
> I restarted cluster also. It's not helps me
>
> Please guide me guys.
>
> -Dhanasekaran.
>
> Did I learn something today? If not, I wasted it.
>



-- 
Nitin Pawar


Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Oleg Ruchovets
Why not ?


On Mon, Feb 25, 2013 at 3:57 PM, dwivedishashwat
wrote:

> Better not to compare
>
>
>
>
> Sent from Samsung Galaxy Note
>
> Oleg Ruchovets  wrote:
> Yes , Sure I ased uncle google first :-). I already saw these links , but
> it is a difference between hadoop ans traditional DB.  I am looking for
> hadoop advantages vs traditional DB.
>
> Thanks
> Oleg.
>
>
> On Mon, Feb 25, 2013 at 3:31 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> Hi Oleg,
>>
>> Have you asked google first?
>>
>> http://indico.cern.ch/conferenceDisplay.py?confId=162202
>> http://www.wikidifference.com/difference-between-hadoop-and-rdbms/
>> http://iablog.sybase.com/paulley/2008/11/hadoop-vs-relational-databases/
>>
>> JM
>>
>>
>> 2013/2/25 Oleg Ruchovets 
>>
>>> Hi ,
>>>Can you please share hadoop advantages vs Traditional Relational DB.
>>> A link to short presentations or benchmarks would be great.
>>>
>>> Thanks.
>>> Oleg.
>>>
>>
>>
>>
>


Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread dwivedishashwat
Better not to compare




Sent from Samsung Galaxy NoteOleg Ruchovets  wrote:Yes , 
Sure I ased uncle google first :-). I already saw these links , but it is a 
difference between hadoop ans traditional DB.  I am looking for hadoop 
advantages vs traditional DB.
 
Thanks
Oleg.


On Mon, Feb 25, 2013 at 3:31 PM, Jean-Marc Spaggiari  
wrote:
Hi Oleg,

Have you asked google first?

http://indico.cern.ch/conferenceDisplay.py?confId=162202
http://www.wikidifference.com/difference-between-hadoop-and-rdbms/
http://iablog.sybase.com/paulley/2008/11/hadoop-vs-relational-databases/

JM


2013/2/25 Oleg Ruchovets 
Hi , 
   Can you please share hadoop advantages vs Traditional Relational DB.
A link to short presentations or benchmarks would be great.

Thanks.
Oleg.





RE: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Michael Namaiandeh
Or Google's cousin.

http://www.lmgtfy.com/



From: Oleg Ruchovets [mailto:oruchov...@gmail.com]
Sent: Monday, February 25, 2013 8:38 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop advantages vs Traditional Relational DB

Yes , Sure I ased uncle google first :-). I already saw these links , but it is 
a difference between hadoop ans traditional DB.  I am looking for hadoop 
advantages vs traditional DB.

Thanks
Oleg.

On Mon, Feb 25, 2013 at 3:31 PM, Jean-Marc Spaggiari 
mailto:jean-m...@spaggiari.org>> wrote:
Hi Oleg,

Have you asked google first?

http://indico.cern.ch/conferenceDisplay.py?confId=162202
http://www.wikidifference.com/difference-between-hadoop-and-rdbms/
http://iablog.sybase.com/paulley/2008/11/hadoop-vs-relational-databases/

JM

2013/2/25 Oleg Ruchovets mailto:oruchov...@gmail.com>>
Hi ,
   Can you please share hadoop advantages vs Traditional Relational DB.
A link to short presentations or benchmarks would be great.

Thanks.
Oleg.




Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Oleg Ruchovets
Yes , Sure I ased uncle google first :-). I already saw these links , but
it is a difference between hadoop ans traditional DB.  I am looking for
hadoop advantages vs traditional DB.

Thanks
Oleg.


On Mon, Feb 25, 2013 at 3:31 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Hi Oleg,
>
> Have you asked google first?
>
> http://indico.cern.ch/conferenceDisplay.py?confId=162202
> http://www.wikidifference.com/difference-between-hadoop-and-rdbms/
> http://iablog.sybase.com/paulley/2008/11/hadoop-vs-relational-databases/
>
> JM
>
>
> 2013/2/25 Oleg Ruchovets 
>
>> Hi ,
>>Can you please share hadoop advantages vs Traditional Relational DB.
>> A link to short presentations or benchmarks would be great.
>>
>> Thanks.
>> Oleg.
>>
>
>
>


Re: Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Jean-Marc Spaggiari
Hi Oleg,

Have you asked google first?

http://indico.cern.ch/conferenceDisplay.py?confId=162202
http://www.wikidifference.com/difference-between-hadoop-and-rdbms/
http://iablog.sybase.com/paulley/2008/11/hadoop-vs-relational-databases/

JM

2013/2/25 Oleg Ruchovets 

> Hi ,
>Can you please share hadoop advantages vs Traditional Relational DB.
> A link to short presentations or benchmarks would be great.
>
> Thanks.
> Oleg.
>


Hadoop advantages vs Traditional Relational DB

2013-02-25 Thread Oleg Ruchovets
Hi ,
   Can you please share hadoop advantages vs Traditional Relational DB.
A link to short presentations or benchmarks would be great.

Thanks.
Oleg.


Re: adding space on existing datanode ?

2013-02-25 Thread Jean-Marc Spaggiari
Hi Brice,

Why are you saying it's incrementing replication? Is any anything
documented anywhere which is leading you to the wrong direction? Bejoy
below right, the replication factor is not changed by the addition of a new
directory under dfs.data.dir. This will "simply" divide the load on this
specific datanode between all the directories you specified.

JM

2013/2/25 

> Hi Brice
>
> By adding a new storage location to dfs.data.dir you are not incrementing
> the replication factor.
>
> You are giving one mode location for the blocks to be copied for that data
> node.
>
> There is no new DataNode added. A new data node would be live only if
> tweak your configs and start a new DataNode daemon.
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
> *From: * brice lecomte 
> *Date: *Mon, 25 Feb 2013 09:50:29 +0100
> *To: *
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: adding space on existing datanode ?
>
> Thanks for your reply, I'm running 1.1.1, hence dfs.data.dir looks to be
> the right property to add, but doing so, it would add another complete
> datanode (incrementing dfs.replication by 1) whereas here, I'd like just to
> "extend" an existing one. Am I wrong ?
>
>
> Le 22/02/2013 19:56, Patai Sangbutsarakum a écrit :
>
> Just want to add up from JM.
>
>  If you already have balancer run in cluster every day, that will help
> the new drive(s) get balanced.
>
>  P
>
>   From: Jean-Marc Spaggiari 
> Reply-To: 
> Date: Fri, 22 Feb 2013 13:14:14 -0500
> To: 
> Subject: Re: adding space on existing datanode ?
>
>   add disk space to you datanode you simply need to add another drive,
> then add it to the dfs.data.dir or dfs.datanode.data.dir entry. After a
> datanode restart, hadoop will start to use it.
>
> It will not balance the existing data between the directories. It will
> continue to add to the 2. If one goes full, it will only continue with the
> other one. If required, you can balance the data manually. Or depending on
> your use case and the options you have, you can stop the datanode, delete
> the content of the 2 data directories and restart it. It will stat to
> receive data to duplicate and will share it evenly between the 2
> directories. This last solution is not recommended. But for a test
> environment it might be easier.
>
>
>


Re: Datanodes shutdown and HBase's regionservers not working

2013-02-25 Thread Nicolas Liochon
Network error messages are not always friendly, especially if there is a
misconfiguration.
This said,  "connection refused" says that the network connection was made,
but that the remote port was not opened on the remote box. I.e. the process
was dead.
It could be useful to pastebin the whole logs as well...


On Mon, Feb 25, 2013 at 12:44 PM, Davey Yan  wrote:

> But... there was no log like "network unreachable".
>
>
> On Mon, Feb 25, 2013 at 6:07 PM, Nicolas Liochon 
> wrote:
> > I agree.
> > Then for HDFS, ...
> > The first thing to check is the network I would say.
> >
> >
> >
> >
> > On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan  wrote:
> >>
> >> Thanks for reply, Nicolas.
> >>
> >> My question: What can lead to shutdown of all of the datanodes?
> >> I believe that the regionservers will be OK if the HDFS is OK.
> >>
> >>
> >> On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon 
> >> wrote:
> >> > Ok, what's your question?
> >> > When you say the datanode went down, was it the datanode processes or
> >> > the
> >> > machines, with both the datanodes and the regionservers?
> >> >
> >> > The NameNode pings its datanodes every 3 seconds. However it will
> >> > internally
> >> > mark the datanodes as dead after 10:30 minutes (even if in the gui you
> >> > have
> >> > 'no answer for x minutes').
> >> > HBase monitoring is done by ZooKeeper. By default, a regionserver is
> >> > considered as dead after 180s with no answer. Before, well, it's
> >> > considered
> >> > as live.
> >> > When you stop a regionserver, it tries to flush its data to the disk
> >> > (i.e.
> >> > hdfs, i.e. the datanodes). That's why if you have no datanodes, or if
> a
> >> > high
> >> > ratio of your datanodes are dead, it can't shutdown. Connection
> refused
> >> > &
> >> > socket timeouts come from the fact that before the 10:30 minutes hdfs
> >> > does
> >> > not declare the nodes as dead, so hbase tries to use them (and,
> >> > obviously,
> >> > fails). Note that there is now  an intermediate state for hdfs
> >> > datanodes,
> >> > called "stale": an intermediary state where the datanode is used only
> if
> >> > you
> >> > have to (i.e. it's the only datanode with a block replica you need).
> It
> >> > will
> >> > be documented in HBase for the 0.96 release. But if all your datanodes
> >> > are
> >> > down it won't change much.
> >> >
> >> > Cheers,
> >> >
> >> > Nicolas
> >> >
> >> >
> >> >
> >> > On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan 
> wrote:
> >> >>
> >> >> Hey guys,
> >> >>
> >> >> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than
> 1
> >> >> year, and it works fine.
> >> >> But the datanodes got shutdown twice in the last month.
> >> >>
> >> >> When the datanodes got shutdown, all of them became "Dead Nodes" in
> >> >> the NN web admin UI(http://ip:50070/dfshealth.jsp),
> >> >> but regionservers of HBase were still live in the HBase web
> >> >> admin(http://ip:60010/master-status), of course, they were zombies.
> >> >> All of the processes of jvm were still running, including
> >> >> hmaster/namenode/regionserver/datanode.
> >> >>
> >> >> When the datanodes got shutdown, the load (using the "top" command)
> of
> >> >> slaves became very high, more than 10, higher than normal running.
> >> >> From the "top" command, we saw that the processes of datanode and
> >> >> regionserver were comsuming CPU.
> >> >>
> >> >> We could not stop the HBase or Hadoop cluster through normal
> >> >> commands(stop-*.sh/*-daemon.sh stop *).
> >> >> So we stopped datanodes and regionservers by kill -9 PID, then the
> >> >> load of slaves returned to normal level, and we start the cluster
> >> >> again.
> >> >>
> >> >>
> >> >> Log of NN at the shutdown point(All of the DNs were removed):
> >> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology:
> >> >> Removing a node: /default-rack/192.168.1.152:50010
> >> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange:
> >> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> >> >> 192.168.1.149:50010
> >> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology:
> >> >> Removing a node: /default-rack/192.168.1.149:50010
> >> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange:
> >> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> >> >> 192.168.1.150:50010
> >> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology:
> >> >> Removing a node: /default-rack/192.168.1.150:50010
> >> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange:
> >> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> >> >> 192.168.1.148:50010
> >> >> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology:
> >> >> Removing a node: /default-rack/192.168.1.148:50010
> >> >>
> >> >>
> >> >> Logs in DNs indicated there were many IOException and
> >> >> SocketTimeoutException:
> >> >> 2013-02-22 11:02:52,354 ERROR
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> >> DatanodeRegistration(

Re: Datanodes shutdown and HBase's regionservers not working

2013-02-25 Thread Davey Yan
But... there was no log like "network unreachable".


On Mon, Feb 25, 2013 at 6:07 PM, Nicolas Liochon  wrote:
> I agree.
> Then for HDFS, ...
> The first thing to check is the network I would say.
>
>
>
>
> On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan  wrote:
>>
>> Thanks for reply, Nicolas.
>>
>> My question: What can lead to shutdown of all of the datanodes?
>> I believe that the regionservers will be OK if the HDFS is OK.
>>
>>
>> On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon 
>> wrote:
>> > Ok, what's your question?
>> > When you say the datanode went down, was it the datanode processes or
>> > the
>> > machines, with both the datanodes and the regionservers?
>> >
>> > The NameNode pings its datanodes every 3 seconds. However it will
>> > internally
>> > mark the datanodes as dead after 10:30 minutes (even if in the gui you
>> > have
>> > 'no answer for x minutes').
>> > HBase monitoring is done by ZooKeeper. By default, a regionserver is
>> > considered as dead after 180s with no answer. Before, well, it's
>> > considered
>> > as live.
>> > When you stop a regionserver, it tries to flush its data to the disk
>> > (i.e.
>> > hdfs, i.e. the datanodes). That's why if you have no datanodes, or if a
>> > high
>> > ratio of your datanodes are dead, it can't shutdown. Connection refused
>> > &
>> > socket timeouts come from the fact that before the 10:30 minutes hdfs
>> > does
>> > not declare the nodes as dead, so hbase tries to use them (and,
>> > obviously,
>> > fails). Note that there is now  an intermediate state for hdfs
>> > datanodes,
>> > called "stale": an intermediary state where the datanode is used only if
>> > you
>> > have to (i.e. it's the only datanode with a block replica you need). It
>> > will
>> > be documented in HBase for the 0.96 release. But if all your datanodes
>> > are
>> > down it won't change much.
>> >
>> > Cheers,
>> >
>> > Nicolas
>> >
>> >
>> >
>> > On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan  wrote:
>> >>
>> >> Hey guys,
>> >>
>> >> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than 1
>> >> year, and it works fine.
>> >> But the datanodes got shutdown twice in the last month.
>> >>
>> >> When the datanodes got shutdown, all of them became "Dead Nodes" in
>> >> the NN web admin UI(http://ip:50070/dfshealth.jsp),
>> >> but regionservers of HBase were still live in the HBase web
>> >> admin(http://ip:60010/master-status), of course, they were zombies.
>> >> All of the processes of jvm were still running, including
>> >> hmaster/namenode/regionserver/datanode.
>> >>
>> >> When the datanodes got shutdown, the load (using the "top" command) of
>> >> slaves became very high, more than 10, higher than normal running.
>> >> From the "top" command, we saw that the processes of datanode and
>> >> regionserver were comsuming CPU.
>> >>
>> >> We could not stop the HBase or Hadoop cluster through normal
>> >> commands(stop-*.sh/*-daemon.sh stop *).
>> >> So we stopped datanodes and regionservers by kill -9 PID, then the
>> >> load of slaves returned to normal level, and we start the cluster
>> >> again.
>> >>
>> >>
>> >> Log of NN at the shutdown point(All of the DNs were removed):
>> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology:
>> >> Removing a node: /default-rack/192.168.1.152:50010
>> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange:
>> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> >> 192.168.1.149:50010
>> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology:
>> >> Removing a node: /default-rack/192.168.1.149:50010
>> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange:
>> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> >> 192.168.1.150:50010
>> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology:
>> >> Removing a node: /default-rack/192.168.1.150:50010
>> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange:
>> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> >> 192.168.1.148:50010
>> >> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology:
>> >> Removing a node: /default-rack/192.168.1.148:50010
>> >>
>> >>
>> >> Logs in DNs indicated there were many IOException and
>> >> SocketTimeoutException:
>> >> 2013-02-22 11:02:52,354 ERROR
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> >> DatanodeRegistration(192.168.1.148:50010,
>> >> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
>> >> infoPort=50075, ipcPort=50020):DataXceiver
>> >> java.io.IOException: Interrupted receiveBlock
>> >> at
>> >>
>> >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
>> >> at
>> >>
>> >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
>> >> at
>> >>
>> >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
>> >> at java.lang.Thread.run(Thread.java:662)
>> >> 2013-02-22 11:03:44

Re: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)

2013-02-25 Thread samir das mohapatra
I am using CDH4.1.2 with MRv1 not YARN.


On Mon, Feb 25, 2013 at 3:47 PM, samir das mohapatra <
samir.help...@gmail.com> wrote:

> yes
>
>
> On Mon, Feb 25, 2013 at 3:30 PM, Nitin Pawar wrote:
>
>> does this match with your issue
>>
>>
>> https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kIPOvrFaQE8
>>
>>
>> On Mon, Feb 25, 2013 at 3:20 PM, samir das mohapatra <
>> samir.help...@gmail.com> wrote:
>>
>>>
>>>
>>> -- Forwarded message --
>>> From: samir das mohapatra 
>>> Date: Mon, Feb 25, 2013 at 3:05 PM
>>> Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS
>>> clusters.(using distch)
>>> To: cdh-u...@cloudera.org
>>>
>>>
>>> Hi All,
>>>   I am getting bellow error , can any one help me on the same issue,
>>>
>>> ERROR LOG:
>>> --
>>>
>>> hadoop@hadoophost2:~$ hadoop   distcp hdfs://
>>> 10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input
>>> 13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs://
>>> 10.192.200.170:50070/tmp/samir.txt]
>>> 13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs://
>>> 10.192.244.237:50070/input
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.IOException: Failed on local exception:
>>> com.google.protobuf.InvalidProtocolBufferException: Protocol message
>>> end-group tag did not match expected tag.; Host Details : local host is:
>>> "hadoophost2/10.192.244.237"; destination host is: "
>>> bl1slu040.corp.adobe.com":50070;
>>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1164)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>> at $Proxy9.getFileInfo(Unknown Source)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:616)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>> at $Proxy9.getFileInfo(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507)
>>> at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
>>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257)
>>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636)
>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
>>> message end-group tag did not match expected tag.
>>> at
>>> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73)
>>> at
>>> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
>>> at
>>> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213)
>>> at
>>> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
>>> at
>>> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
>>> at
>>> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
>>> at
>>> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
>>> at
>>> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
>>> at
>>> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
>>> at
>>> org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
>>> at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882)
>>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
>>>
>>>
>>>
>>> Regards,
>>> samir
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


Re: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)

2013-02-25 Thread samir das mohapatra
yes


On Mon, Feb 25, 2013 at 3:30 PM, Nitin Pawar wrote:

> does this match with your issue
>
> https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kIPOvrFaQE8
>
>
> On Mon, Feb 25, 2013 at 3:20 PM, samir das mohapatra <
> samir.help...@gmail.com> wrote:
>
>>
>>
>> -- Forwarded message --
>> From: samir das mohapatra 
>> Date: Mon, Feb 25, 2013 at 3:05 PM
>> Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS
>> clusters.(using distch)
>> To: cdh-u...@cloudera.org
>>
>>
>> Hi All,
>>   I am getting bellow error , can any one help me on the same issue,
>>
>> ERROR LOG:
>> --
>>
>> hadoop@hadoophost2:~$ hadoop   distcp hdfs://
>> 10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input
>> 13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs://
>> 10.192.200.170:50070/tmp/samir.txt]
>> 13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs://
>> 10.192.244.237:50070/input
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.IOException: Failed on local exception:
>> com.google.protobuf.InvalidProtocolBufferException: Protocol message
>> end-group tag did not match expected tag.; Host Details : local host is:
>> "hadoophost2/10.192.244.237"; destination host is: "
>> bl1slu040.corp.adobe.com":50070;
>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1164)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy9.getFileInfo(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:616)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy9.getFileInfo(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257)
>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
>> message end-group tag did not match expected tag.
>> at
>> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73)
>> at
>> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
>> at
>> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213)
>> at
>> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
>> at
>> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
>> at
>> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
>> at
>> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
>> at
>> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
>> at
>> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
>> at
>> org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
>> at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882)
>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
>>
>>
>>
>> Regards,
>> samir
>>
>>
>
>
> --
> Nitin Pawar
>


Re: Datanodes shutdown and HBase's regionservers not working

2013-02-25 Thread Nicolas Liochon
I agree.
Then for HDFS, ...
The first thing to check is the network I would say.




On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan  wrote:

> Thanks for reply, Nicolas.
>
> My question: What can lead to shutdown of all of the datanodes?
> I believe that the regionservers will be OK if the HDFS is OK.
>
>
> On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon 
> wrote:
> > Ok, what's your question?
> > When you say the datanode went down, was it the datanode processes or the
> > machines, with both the datanodes and the regionservers?
> >
> > The NameNode pings its datanodes every 3 seconds. However it will
> internally
> > mark the datanodes as dead after 10:30 minutes (even if in the gui you
> have
> > 'no answer for x minutes').
> > HBase monitoring is done by ZooKeeper. By default, a regionserver is
> > considered as dead after 180s with no answer. Before, well, it's
> considered
> > as live.
> > When you stop a regionserver, it tries to flush its data to the disk
> (i.e.
> > hdfs, i.e. the datanodes). That's why if you have no datanodes, or if a
> high
> > ratio of your datanodes are dead, it can't shutdown. Connection refused &
> > socket timeouts come from the fact that before the 10:30 minutes hdfs
> does
> > not declare the nodes as dead, so hbase tries to use them (and,
> obviously,
> > fails). Note that there is now  an intermediate state for hdfs datanodes,
> > called "stale": an intermediary state where the datanode is used only if
> you
> > have to (i.e. it's the only datanode with a block replica you need). It
> will
> > be documented in HBase for the 0.96 release. But if all your datanodes
> are
> > down it won't change much.
> >
> > Cheers,
> >
> > Nicolas
> >
> >
> >
> > On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan  wrote:
> >>
> >> Hey guys,
> >>
> >> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than 1
> >> year, and it works fine.
> >> But the datanodes got shutdown twice in the last month.
> >>
> >> When the datanodes got shutdown, all of them became "Dead Nodes" in
> >> the NN web admin UI(http://ip:50070/dfshealth.jsp),
> >> but regionservers of HBase were still live in the HBase web
> >> admin(http://ip:60010/master-status), of course, they were zombies.
> >> All of the processes of jvm were still running, including
> >> hmaster/namenode/regionserver/datanode.
> >>
> >> When the datanodes got shutdown, the load (using the "top" command) of
> >> slaves became very high, more than 10, higher than normal running.
> >> From the "top" command, we saw that the processes of datanode and
> >> regionserver were comsuming CPU.
> >>
> >> We could not stop the HBase or Hadoop cluster through normal
> >> commands(stop-*.sh/*-daemon.sh stop *).
> >> So we stopped datanodes and regionservers by kill -9 PID, then the
> >> load of slaves returned to normal level, and we start the cluster
> >> again.
> >>
> >>
> >> Log of NN at the shutdown point(All of the DNs were removed):
> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology:
> >> Removing a node: /default-rack/192.168.1.152:50010
> >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> >> 192.168.1.149:50010
> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology:
> >> Removing a node: /default-rack/192.168.1.149:50010
> >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> >> 192.168.1.150:50010
> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology:
> >> Removing a node: /default-rack/192.168.1.150:50010
> >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> >> 192.168.1.148:50010
> >> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology:
> >> Removing a node: /default-rack/192.168.1.148:50010
> >>
> >>
> >> Logs in DNs indicated there were many IOException and
> >> SocketTimeoutException:
> >> 2013-02-22 11:02:52,354 ERROR
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(192.168.1.148:50010,
> >> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> java.io.IOException: Interrupted receiveBlock
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> >> at java.lang.Thread.run(Thread.java:662)
> >> 2013-02-22 11:03:44,823 WARN
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(192.168.1.148:50010,
> >> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
> >> infoPort=50075, ipcPort=50020):Got exception while serving
> >> blk_-1985405101514576650_247001 to /

Re: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)

2013-02-25 Thread Nitin Pawar
does this match with your issue

https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kIPOvrFaQE8


On Mon, Feb 25, 2013 at 3:20 PM, samir das mohapatra <
samir.help...@gmail.com> wrote:

>
>
> -- Forwarded message --
> From: samir das mohapatra 
> Date: Mon, Feb 25, 2013 at 3:05 PM
> Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS
> clusters.(using distch)
> To: cdh-u...@cloudera.org
>
>
> Hi All,
>   I am getting bellow error , can any one help me on the same issue,
>
> ERROR LOG:
> --
>
> hadoop@hadoophost2:~$ hadoop   distcp hdfs://
> 10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input
> 13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs://
> 10.192.200.170:50070/tmp/samir.txt]
> 13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs://
> 10.192.244.237:50070/input
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: Failed on local exception:
> com.google.protobuf.InvalidProtocolBufferException: Protocol message
> end-group tag did not match expected tag.; Host Details : local host is:
> "hadoophost2/10.192.244.237"; destination host is: "
> bl1slu040.corp.adobe.com":50070;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
> at org.apache.hadoop.ipc.Client.call(Client.java:1164)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257)
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
> message end-group tag did not match expected tag.
> at
> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73)
> at
> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
> at
> org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
> at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
>
>
>
> Regards,
> samir
>
>


-- 
Nitin Pawar


Fwd: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)

2013-02-25 Thread samir das mohapatra
-- Forwarded message --
From: samir das mohapatra 
Date: Mon, Feb 25, 2013 at 3:05 PM
Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS
clusters.(using distch)
To: cdh-u...@cloudera.org


Hi All,
  I am getting bellow error , can any one help me on the same issue,

ERROR LOG:
--

hadoop@hadoophost2:~$ hadoop   distcp hdfs://
10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input
13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs://
10.192.200.170:50070/tmp/samir.txt]
13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs://
10.192.244.237:50070/input
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Failed on local exception:
com.google.protobuf.InvalidProtocolBufferException: Protocol message
end-group tag did not match expected tag.; Host Details : local host is:
"hadoophost2/10.192.244.237"; destination host is:
"bl1slu040.corp.adobe.com":50070;

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257)
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
message end-group tag did not match expected tag.
at
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73)
at
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
at
com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
at
com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
at
org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)



Regards,
samir


Re: Datanodes shutdown and HBase's regionservers not working

2013-02-25 Thread Davey Yan
Thanks for reply, Nicolas.

My question: What can lead to shutdown of all of the datanodes?
I believe that the regionservers will be OK if the HDFS is OK.


On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon  wrote:
> Ok, what's your question?
> When you say the datanode went down, was it the datanode processes or the
> machines, with both the datanodes and the regionservers?
>
> The NameNode pings its datanodes every 3 seconds. However it will internally
> mark the datanodes as dead after 10:30 minutes (even if in the gui you have
> 'no answer for x minutes').
> HBase monitoring is done by ZooKeeper. By default, a regionserver is
> considered as dead after 180s with no answer. Before, well, it's considered
> as live.
> When you stop a regionserver, it tries to flush its data to the disk (i.e.
> hdfs, i.e. the datanodes). That's why if you have no datanodes, or if a high
> ratio of your datanodes are dead, it can't shutdown. Connection refused &
> socket timeouts come from the fact that before the 10:30 minutes hdfs does
> not declare the nodes as dead, so hbase tries to use them (and, obviously,
> fails). Note that there is now  an intermediate state for hdfs datanodes,
> called "stale": an intermediary state where the datanode is used only if you
> have to (i.e. it's the only datanode with a block replica you need). It will
> be documented in HBase for the 0.96 release. But if all your datanodes are
> down it won't change much.
>
> Cheers,
>
> Nicolas
>
>
>
> On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan  wrote:
>>
>> Hey guys,
>>
>> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than 1
>> year, and it works fine.
>> But the datanodes got shutdown twice in the last month.
>>
>> When the datanodes got shutdown, all of them became "Dead Nodes" in
>> the NN web admin UI(http://ip:50070/dfshealth.jsp),
>> but regionservers of HBase were still live in the HBase web
>> admin(http://ip:60010/master-status), of course, they were zombies.
>> All of the processes of jvm were still running, including
>> hmaster/namenode/regionserver/datanode.
>>
>> When the datanodes got shutdown, the load (using the "top" command) of
>> slaves became very high, more than 10, higher than normal running.
>> From the "top" command, we saw that the processes of datanode and
>> regionserver were comsuming CPU.
>>
>> We could not stop the HBase or Hadoop cluster through normal
>> commands(stop-*.sh/*-daemon.sh stop *).
>> So we stopped datanodes and regionservers by kill -9 PID, then the
>> load of slaves returned to normal level, and we start the cluster
>> again.
>>
>>
>> Log of NN at the shutdown point(All of the DNs were removed):
>> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology:
>> Removing a node: /default-rack/192.168.1.152:50010
>> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> 192.168.1.149:50010
>> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology:
>> Removing a node: /default-rack/192.168.1.149:50010
>> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> 192.168.1.150:50010
>> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology:
>> Removing a node: /default-rack/192.168.1.150:50010
>> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> 192.168.1.148:50010
>> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology:
>> Removing a node: /default-rack/192.168.1.148:50010
>>
>>
>> Logs in DNs indicated there were many IOException and
>> SocketTimeoutException:
>> 2013-02-22 11:02:52,354 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.1.148:50010,
>> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
>> infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.IOException: Interrupted receiveBlock
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
>> at java.lang.Thread.run(Thread.java:662)
>> 2013-02-22 11:03:44,823 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.1.148:50010,
>> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
>> infoPort=50075, ipcPort=50020):Got exception while serving
>> blk_-1985405101514576650_247001 to /192.168.1.148:
>> java.net.SocketTimeoutException: 48 millis timeout while waiting
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.148:50010
>> remote=/192.168.1.148:48654]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>> at
>> org.apache.hadoop

Re: Datanodes shutdown and HBase's regionservers not working

2013-02-25 Thread Nicolas Liochon
Ok, what's your question?
When you say the datanode went down, was it the datanode processes or the
machines, with both the datanodes and the regionservers?

The NameNode pings its datanodes every 3 seconds. However it will
internally mark the datanodes as dead after 10:30 minutes (even if in the
gui you have 'no answer for x minutes').
HBase monitoring is done by ZooKeeper. By default, a regionserver is
considered as dead after 180s with no answer. Before, well, it's considered
as live.
When you stop a regionserver, it tries to flush its data to the disk (i.e.
hdfs, i.e. the datanodes). That's why if you have no datanodes, or if a
high ratio of your datanodes are dead, it can't shutdown. Connection
refused & socket timeouts come from the fact that before the 10:30 minutes
hdfs does not declare the nodes as dead, so hbase tries to use them (and,
obviously, fails). Note that there is now  an intermediate state for hdfs
datanodes, called "stale": an intermediary state where the datanode is used
only if you have to (i.e. it's the only datanode with a block replica you
need). It will be documented in HBase for the 0.96 release. But if all your
datanodes are down it won't change much.

Cheers,

Nicolas



On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan  wrote:

> Hey guys,
>
> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than 1
> year, and it works fine.
> But the datanodes got shutdown twice in the last month.
>
> When the datanodes got shutdown, all of them became "Dead Nodes" in
> the NN web admin UI(http://ip:50070/dfshealth.jsp),
> but regionservers of HBase were still live in the HBase web
> admin(http://ip:60010/master-status), of course, they were zombies.
> All of the processes of jvm were still running, including
> hmaster/namenode/regionserver/datanode.
>
> When the datanodes got shutdown, the load (using the "top" command) of
> slaves became very high, more than 10, higher than normal running.
> From the "top" command, we saw that the processes of datanode and
> regionserver were comsuming CPU.
>
> We could not stop the HBase or Hadoop cluster through normal
> commands(stop-*.sh/*-daemon.sh stop *).
> So we stopped datanodes and regionservers by kill -9 PID, then the
> load of slaves returned to normal level, and we start the cluster
> again.
>
>
> Log of NN at the shutdown point(All of the DNs were removed):
> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology:
> Removing a node: /default-rack/192.168.1.152:50010
> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> 192.168.1.149:50010
> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology:
> Removing a node: /default-rack/192.168.1.149:50010
> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> 192.168.1.150:50010
> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology:
> Removing a node: /default-rack/192.168.1.150:50010
> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
> 192.168.1.148:50010
> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology:
> Removing a node: /default-rack/192.168.1.148:50010
>
>
> Logs in DNs indicated there were many IOException and
> SocketTimeoutException:
> 2013-02-22 11:02:52,354 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.1.148:50010,
> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: Interrupted receiveBlock
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> at java.lang.Thread.run(Thread.java:662)
> 2013-02-22 11:03:44,823 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.1.148:50010,
> storageID=DS-970284113-117.25.149.160-50010-1328074119937,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-1985405101514576650_247001 to /192.168.1.148:
> java.net.SocketTimeoutException: 48 millis timeout while waiting
> for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.148:50010
> remote=/192.168.1.148:48654]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendB

Re: WordPairCount Mapreduce question.

2013-02-25 Thread Harsh J
Also noteworthy is that the performance gain can only be had (from the
byte level compare method) iff the
serialization/deserialization/format of data is comparable at the byte
level. One such provider is Apache Avro:
http://avro.apache.org/docs/current/spec.html#order.

Most other implementations simply deserialize again from the
bytestream and then compare, which has a higher (or, regular) cost.

On Mon, Feb 25, 2013 at 1:44 PM, Mahesh Balija
 wrote:
> byte array comparison is for performance reasons only, but NOT the way you
> are thinking.
> This method comes from an interface called RawComparator which provides the
> prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2);) for this method.
> In the sorting phase where the keys are sorted, because of this
> implementation the records are read from the stream directly and sorted
> without the need to deserializing them into Objects.
>
> Best,
> Mahesh Balija,
> CalsoftLabs.
>
>
> On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai  wrote:
>>
>> Thanks Mahesh for your help.
>>
>> Wondering if u can provide some insight with the below compare method
>> using byte[] in the SecondarySort example:
>>
>> public static class Comparator extends WritableComparator {
>> public Comparator() {
>> super(URICountKey.class);
>> }
>>
>> public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
>> int l2) {
>> return compareBytes(b1, s1, l1, b2, s2, l2);
>> }
>> }
>>
>> My question is in the below compare method that i have given we are
>> comparing word1/word2
>> which makes sense but what about this byte[] comparison, is it right in
>> assuming  it converts each objects word1/word2/word3 to byte[] and compares
>> them.
>> If so is it for performance reason it is done.
>> Could you please verify.
>> Thanks
>> Sai
>> 
>> From: Mahesh Balija 
>> To: user@hadoop.apache.org; Sai Sai 
>> Sent: Saturday, 23 February 2013 5:23 AM
>> Subject: Re: WordPairCount Mapreduce question.
>>
>> Please check the in-line answers...
>>
>> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai  wrote:
>>
>>
>> Hello
>>
>> I have a question about how Mapreduce sorting works internally with
>> multiple columns.
>>
>> Below r my classes using 2 columns in an input file given below.
>>
>> 1st question: About the method hashCode, we r adding a "31 + ", i am
>> wondering why is this required. what does 31 refer to.
>>
>> This is how usually hashcode is calculated for any String instance
>> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
>> the String. Since in your case you only have 2 chars then it will be a *
>> 31^0 + b * 31^1.
>>
>>
>>
>> 2nd question: what if my input file has 3 columns instead of 2 how would
>> you write a compare method and was wondering if anyone can map this to a
>> real world scenario it will be really helpful.
>>
>> you will extend the same approach for the third column,
>>  public int compareTo(WordPairCountKey o) {
>> int diff = word1.compareTo(o.word1);
>> if (diff == 0) {
>> diff = word2.compareTo(o.word2);
>> if(diff==0){
>>  diff = word3.compareTo(o.word3);
>> }
>> }
>> return diff;
>> }
>>
>>
>>
>>
>> @Override
>> public int compareTo(WordPairCountKey o) {
>> int diff = word1.compareTo(o.word1);
>> if (diff == 0) {
>> diff = word2.compareTo(o.word2);
>> }
>> return diff;
>> }
>>
>> @Override
>> public int hashCode() {
>> return word1.hashCode() + 31 * word2.hashCode();
>> }
>>
>> **
>>
>> Here is my input file wordpair.txt
>>
>> **
>>
>> ab
>> ac
>> ab
>> ad
>> bd
>> ef
>> bd
>> ef
>> bd
>>
>> **
>>
>> Here is my WordPairObject:
>>
>> *
>>
>> public class WordPairCountKey implements
>> WritableComparable {
>>
>> private String word1;
>> private String word2;
>>
>> @Override
>> public int compareTo(WordPairCountKey o) {
>> int diff = word1.compareTo(o.word1);
>> if (diff == 0) {
>> diff = word2.compareTo(o.word2);
>> }
>> return diff;
>> }
>>
>> @Override
>> public int hashCode() {
>> return word1.hashCode() + 31 * word2.hashCode();
>> }
>>
>>
>> public String getWord1() {
>> return word1;
>> }
>>
>> public void setWord1(String word1) {
>> this.word1 = word1;
>> }
>>
>> public String getWord2() {
>> return word2;
>> }
>>
>> public void setWord2(String word2) {
>> this.word2 = word2;
>> }
>>
>> @Override
>> public void readFields(DataInput in) throws IOException {
>> word1 = in.readUTF();
>> word2 = in.readUTF();
>> }
>>
>> @Override
>> p

How do _you_ document your hadoop jobs?

2013-02-25 Thread David Parks
We've taken to documenting our Hadoop jobs in a simple visual manner using
PPT (attached example). I wonder how others document their jobs?

 

We often add notes to the text section of the PPT slides as well.

 



<>

Re: MapReduce job over HBase-0.94.3 fails

2013-02-25 Thread Harsh J
Hi Bhushan,

Please email the user@hadoop.apache.org lists for any Apache Hadoop
user-related questions (instead of sending it to me directly this
way). You can subscribe by following instructions at
http://hadoop.apache.org/mailing_lists.html. I've added the list in my
reply here.

On Mon, Feb 25, 2013 at 1:09 PM,   wrote:
> I am using Hadoop-1.0.3 and HBase-0.94.3. I am able to run MapReduce job over 
> Hadoop-1.0.3. But I got following error while running MapReduce job over 
> HBase-0.94.3.
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hadoop.net.DNS.reverseDns(DNS.java:72)
> at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.reverseDNS(TableInputFormatBase.java:218)
> at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:183)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
> at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
> at org.apache.hadoop.hbase.mapreduce.CopyTable.main(CopyTable.java:237)
>
>
> Is their any versioning issue for HBase?
>
> How to resolve it?
>
> Thanks in advance.
>



--
Harsh J


Re: adding space on existing datanode ?

2013-02-25 Thread bejoy . hadoop
Hi Brice

By adding a new storage location to dfs.data.dir you are not incrementing the 
replication factor.

You are giving one mode location for the blocks to be copied for that data node.

There is no new DataNode added. A new data node would be live only if tweak 
your configs and start a new DataNode daemon.


Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: brice lecomte 
Date: Mon, 25 Feb 2013 09:50:29 
To: 
Reply-To: user@hadoop.apache.org
Subject: Re: adding space on existing datanode ?

Thanks for your reply, I'm running 1.1.1, hence dfs.data.dir looks to be
the right property to add, but doing so, it would add another complete
datanode (incrementing dfs.replication by 1) whereas here, I'd like just
to "extend" an existing one. Am I wrong ?


Le 22/02/2013 19:56, Patai Sangbutsarakum a écrit :
> Just want to add up from JM.
>
> If you already have balancer run in cluster every day, that will help
> the new drive(s) get balanced.
>
> P
>
> From: Jean-Marc Spaggiari  >
> Reply-To: mailto:user@hadoop.apache.org>>
> Date: Fri, 22 Feb 2013 13:14:14 -0500
> To: mailto:user@hadoop.apache.org>>
> Subject: Re: adding space on existing datanode ?
>
>  add disk space to you datanode you simply need to add another drive,
> then add it to the dfs.data.dir or dfs.datanode.data.dir entry. After
> a datanode restart, hadoop will start to use it.
>
> It will not balance the existing data between the directories. It will
> continue to add to the 2. If one goes full, it will only continue with
> the other one. If required, you can balance the data manually. Or
> depending on your use case and the options you have, you can stop the
> datanode, delete the content of the 2 data directories and restart it.
> It will stat to receive data to duplicate and will share it evenly
> between the 2 directories. This last solution is not recommended. But
> for a test environment it might be easier. 




Re: adding space on existing datanode ?

2013-02-25 Thread brice lecomte
Thanks for your reply, I'm running 1.1.1, hence dfs.data.dir looks to be
the right property to add, but doing so, it would add another complete
datanode (incrementing dfs.replication by 1) whereas here, I'd like just
to "extend" an existing one. Am I wrong ?


Le 22/02/2013 19:56, Patai Sangbutsarakum a écrit :
> Just want to add up from JM.
>
> If you already have balancer run in cluster every day, that will help
> the new drive(s) get balanced.
>
> P
>
> From: Jean-Marc Spaggiari  >
> Reply-To: mailto:user@hadoop.apache.org>>
> Date: Fri, 22 Feb 2013 13:14:14 -0500
> To: mailto:user@hadoop.apache.org>>
> Subject: Re: adding space on existing datanode ?
>
>  add disk space to you datanode you simply need to add another drive,
> then add it to the dfs.data.dir or dfs.datanode.data.dir entry. After
> a datanode restart, hadoop will start to use it.
>
> It will not balance the existing data between the directories. It will
> continue to add to the 2. If one goes full, it will only continue with
> the other one. If required, you can balance the data manually. Or
> depending on your use case and the options you have, you can stop the
> datanode, delete the content of the 2 data directories and restart it.
> It will stat to receive data to duplicate and will share it evenly
> between the 2 directories. This last solution is not recommended. But
> for a test environment it might be easier. 



Re: Slow MR time and high network utilization with all local data

2013-02-25 Thread Robert Dyer
Thanks for pointing me towards short circuit!  I dug around and couldnt
find in the logs any mention of the local reader loading, and then spotted
a config error.  So when I used HBase it set the short circuit via its
configs (which were correct), but when I didn't use HBase it failed to set
the short circuit.

Now I see no network utilization for this job and it runs *much* faster (13
mins instead of 2+ hours)!  Problem solved! :-)

Thanks Harsh!


On Mon, Feb 25, 2013 at 1:41 AM, Robert Dyer  wrote:

> I am using Ganglia.
>
> Note I have short circuit reads enabled (I think, I never verified it was
> working but I do get errors if I run jobs as another user).
>
> Also, if Ganglia's network use included the local socket then I would see
> network utilization in all cases.  I see no utilization when using HBase as
> MR input and MapFile.  I also see a small amount when using HBase for both
> (as one would expect).
>
>
> On Mon, Feb 25, 2013 at 1:22 AM, Harsh J  wrote:
>
>> Hi Robert,
>>
>> How are you measuring the network usage? Note that unless short
>> circuit reading is on, data reads are done over a local socket as
>> well, and may appear in network traffic observing tools too (but do
>> not mean they are over the network).
>>
>> On Mon, Feb 25, 2013 at 2:35 AM, Robert Dyer  wrote:
>> > I have a small 6 node dev cluster.  I use a 1GB SequenceFile as input
>> to a
>> > MapReduce job, using a custom split size of 10MB (to increase the
>> number of
>> > maps).  Each map call will read random entries out of a shared MapFile
>> (that
>> > is around 50GB).
>> >
>> > I set replication to 6 on both of these files, so all of the data
>> should be
>> > local for each map task.  I verified via fsck that no blocks are
>> > under-replicated.
>> >
>> > Despite this, for some reason the MR job maxes out the network and
>> takes an
>> > extremely long time.  What could be causing this?
>> >
>> > Note that the total number of map outputs for this job is around 400
>> and the
>> > reducer just passes the values through, so there shouldn't be much
>> network
>> > utilized by the output.
>> >
>> > As an experiment, I switched from the SeqFile input to an HBase table
>> and
>> > now see almost no network used.  I also tried leaving the SeqFile as
>> input
>> > and switched the MapFile to an HBase table and see about 30% network
>> used
>> > (which makes sense, as now that 50GB data isn't always local).
>> >
>> > What is going on here?  How can I debug to see what data is being
>> > transferred over the network?
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
>
> Robert Dyer
> rd...@iastate.edu
>



-- 

Robert Dyer
rd...@iastate.edu


Re: WordPairCount Mapreduce question.

2013-02-25 Thread Mahesh Balija
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai  wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
> public Comparator() {
> super(URICountKey.class);
> }
>
> public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
> return compareBytes(b1, s1, l1, b2, s2, l2);
> }
> }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   --
> *From:* Mahesh Balija 
> *To:* user@hadoop.apache.org; Sai Sai 
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai  wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
> int diff = word1.compareTo(o.word1);
> if (diff == 0) {
> diff = word2.compareTo(o.word2);
> if(diff==0){
>  diff = word3.compareTo(o.word3);
> }
>  }
> return diff;
> }
>
>
>
>
> @Override
> public int compareTo(WordPairCountKey o) {
> int diff = word1.compareTo(o.word1);
> if (diff == 0) {
> diff = word2.compareTo(o.word2);
> }
> return diff;
> }
>
> @Override
> public int hashCode() {
> return word1.hashCode() + 31 * word2.hashCode();
> }
>
> **
>
> Here is my input file wordpair.txt
>
> **
>
> ab
> ac
> ab
> ad
> bd
> ef
> bd
> ef
> bd
>
> **
>
> Here is my WordPairObject:
>
> *
>
> public class WordPairCountKey implements
> WritableComparable {
>
> private String word1;
> private String word2;
>
> @Override
> public int compareTo(WordPairCountKey o) {
> int diff = word1.compareTo(o.word1);
> if (diff == 0) {
> diff = word2.compareTo(o.word2);
> }
> return diff;
> }
>
> @Override
> public int hashCode() {
> return word1.hashCode() + 31 * word2.hashCode();
> }
>
>
> public String getWord1() {
> return word1;
> }
>
> public void setWord1(String word1) {
> this.word1 = word1;
> }
>
> public String getWord2() {
> return word2;
> }
>
> public void setWord2(String word2) {
> this.word2 = word2;
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
> word1 = in.readUTF();
> word2 = in.readUTF();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
> out.writeUTF(word1);
> out.writeUTF(word2);
> }
>
>
> @Override
> public String toString() {
> return "[word1=" + word1 + ", word2=" + word2 + "]";
> }
>
> }
>
> **
>
> Any help will be really appreciated.
> Thanks
> Sai
>
>
>
>
>