ISSUE with Filter Hbase Table using SingleColumnValueFilter
Dear developer I am looking for a solution where i can applu the *SingleColumnValueFilter to select only the value which i will mention in the value parameter not other then the value which i will pass.* * Exxample:* SingleColumnValueFilter colValFilter = new SingleColumnValueFilter(Bytes.toBytes(cf1), Bytes.toBytes(code) , CompareFilter.CompareOp.EQUAL, new SubstringComparator(SAMIR_AL_START)); colValFilter.setFilterIfMissing(false); filters.add(colValFilter); Note: I want only the *SAMIR_AL_START value not like XYZ_AL_START also, I mean I want exact match value not likly.* * Right now it is giving both SAMIR_AL_START along with XYZ_AL_START* *Regards,* *samir.*
Did anyone work with Hbase mapreduce with multiple table as input ?
Dear hadoop/hbase developer Did Anyone work with Hbase mapreduce with multiple table as input ? Any url-link or example will help me alot. Thanks in advance. Thanks, samir.
Re: Getting error from sqoop2 command
Dear Sqoop user/dev I am facing on issue , given below. Do you have any Idea why i am facing this error and what could be the problem ? sqoop:000 show connector --all *Exception has occurred during processing command * *Exception: com.sun.jersey.api.client.UniformInterfaceException Message: GET http://localhost:12000/sqoop/v1/connector/all returned a response status of 404 Not Found* * * *After that i ser the server the it got resolved, but i am getting connection refused* * * sqoop:000 set server --host localhost --port 8050 --webapp sqoop Server is set successfully sqoop:000 show connector --all *Exception has occurred during processing command * *Exception: com.sun.jersey.api.client.ClientHandlerException Message: java.net.ConnectException: Connection refused* Regards, samir. On Tue, Oct 8, 2013 at 12:16 PM, samir das mohapatra samir.help...@gmail.com wrote: Dear Sqoop user/dev I am facing on issue , given below. Do you have any Idea why i am facing this error and what could be the problem ? sqoop:000 show connector --all *Exception has occurred during processing command * *Exception: com.sun.jersey.api.client.UniformInterfaceException Message: GET http://localhost:12000/sqoop/v1/connector/all returned a response status of 404 Not Found* * * *After that i ser the server the it got resolved, but i am getting connection refused* * * sqoop:000 set server --host localhost --port 8050 --webapp sqoop Server is set successfully sqoop:000 show connector --all *Exception has occurred during processing command * *Exception: com.sun.jersey.api.client.ClientHandlerException Message: java.net.ConnectException: Connection refused* Regards, samir.
Facing issue using Sqoop2
Dear All I am getting error like blow mention, did any one got from sqoop2 Error: sqoop:000 set server --host hostname1 --port 8050 --webapp sqoop Server is set successfully sqoop:000 show server -all Server host: hostname1 Server port: 8050 Server webapp: sqoop sqoop:000 show version --all client version: Sqoop 1.99.2-cdh4.4.0 revision Compiled by jenkins on Tue Sep 3 20:15:11 PDT 2013 Exception has occurred during processing command Exception: com.sun.jersey.api.client.ClientHandlerException Message: java.net.ConnectException: Connection refused Regards, samir.
how to use Sqoop command without Hardcoded password while using sqoop command
Dear Hadoop/Sqoop users Is there any way to call sqoop command without hard coding the password for the specific RDBMS. ?. If we are hard coding the password then it will be huge issue with sequrity. Regards, samir.
How to ignore empty file comming out from hive map side join
Dear Hive/Hadoop Developer Just I was runing hive mapside join , along with output data I colud see some empty file in map stage, Why it is ? and how to ignore this file . Regards, samir.
While Inserting data into hive Why I colud not able to query ?
Dear All, Did any one faced the issue : While Loading huge dataset into hive table , hive restricting me to query from same table. I have set hive.support.concurrency=true, still showing conflicting lock present for TABLENAME mode SHARED property namehive.support.concurrency/name valuetrue/value descriptionWhether hive supports concurrency or not. A zookeeper instance must be up and running for the default hive lock manager to support read-write locks./description /property If It is like that then how to solve that issue? is there any row lock ? Regards
Error While Processing SequeceFile with Lzo Compressed in hive External table (CDH4.3)
Dear All, Any One would have face this type of Issue ? I am getting Some error while processing Sequecen file with LZO compresss in hive query In CDH4.3.x Distribution. Error Logs: SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; -rw-r--r-- 3 myuser supergroup 25172 2013-06-19 21:25 /user/myDir/00_0 -- Lzo Compressed sequence file -rw-r--r-- 3 myuser supergroup 71007 2013-06-19 21:42 /user/myDir/00_0 -- Normal sequence file 1. Now the problem that if I create an External table on top of the directory to read the data it gives me an error : *Failed with exception java.io.IOException:java.io.EOFException: Premature EOF from inputStream* * * *Table Creation:* * * * * CREATE EXTERNAL TABLE IF NOT EXISTS MyTable ( userip string usertid string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' ESCAPED BY '\020' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' LINES TERMINATED BY '\012' STORED AS SEQUENCEFILE LOCATION '/path/to/file'; After that while querying to the table getting error: *Failed with exception java.io.IOException:java.io.EOFException: Premature EOF from inputStream* * * *Why it is like that?* * * *Regards,* *samir* * * * * * * *T* * * * * * *
How to get the intermediate mapper output file name
Hi all, How to get the mapper output filename inside the the mapper . or How to change the mapper ouput file name. Default it looks like part-m-0,part-m-1 etc. Regards, samir.
Pulling data from secured hadoop cluster to another hadoop cluster
Hi All, I could able to connect the hadoop (source ) cluster after ssh is established. But i wanted to know, If I want to pull some data using distcp from source secured hadoop box to another hadoop cluster , I could not able to ping name node machine. In this approach how to run distcp command from target cluster in with secured connection. Source: hadoop.server1 (ssh secured) Target: hadoop.server2 (runing distcp here) running command: distcp hftp://hadoop.server1:50070/dataSet hdfs://hadoop.server2:54310/targetDataSet Regards, samir.
Re: Pulling data from secured hadoop cluster to another hadoop cluster
it is not hadoop security issue, the security is in host , I Mean to say in network level. I could not able to ping bcz source system is designed such a way that only you can connecto through ssh . If this is the case how to over come this problem. What extra parameter i need to add i ssh level so that i could able to ping the machine. All the servers are in same domain. On Tue, May 28, 2013 at 7:35 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Also Samir, when you say 'secured', by any chance that cluster is secured with Kerberos (rather than ssh)? -Shahab On Tue, May 28, 2013 at 8:29 AM, Nitin Pawar nitinpawar...@gmail.comwrote: hadoop daemons do not use ssh to communicate. if your distcp job could not connect to remote server then either the connection was rejected by the target namenode or the it was not able to establish the network connection. were you able to see the hdfs on server1 from server2? On Tue, May 28, 2013 at 5:17 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I could able to connect the hadoop (source ) cluster after ssh is established. But i wanted to know, If I want to pull some data using distcp from source secured hadoop box to another hadoop cluster , I could not able to ping name node machine. In this approach how to run distcp command from target cluster in with secured connection. Source: hadoop.server1 (ssh secured) Target: hadoop.server2 (runing distcp here) running command: distcp hftp://hadoop.server1:50070/dataSet hdfs://hadoop.server2:54310/targetDataSet Regards, samir. -- Nitin Pawar
Issue with data Copy from CDH3 to CDH4
Hi all, We tried to pull the data from upstream cluster(cdh3) which is running cdh3 to down stream system (running cdh4) ,Using *distcp* to copy the data, it was throughing some exception bcz due to version isssue. I wanted to know is there any solution to pull the data from CDH3 to CDH4 without manually. What is the other approach to solve the problem.(Data are 10 PB) Regards, samir.
Re: how to copy a table from one hbase cluster to another cluster?
Thanks, for reply I need to copy the hbase table into another cluster through the java code. Any example will help to me On Wed, Mar 20, 2013 at 8:48 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Samir, Is this what you are looking for? http://hbase.apache.org/book/ops_mgt.html#copytable What kind of help do you need? JM 2013/3/20 samir das mohapatra samir.help...@gmail.com: Hi All, Can you help me to copy one hbase table to another cluster hbase (Table copy) . Regards, samir
Re: how to copy a table from one hbase cluster to another cluster?
yes, yes just i thought same thing. many many thanks. On Wed, Mar 20, 2013 at 8:55 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Samir, Have you looked at the link I sent you? You have a command line for that, you have an example, and if you need to do it in Java, you san simply open the org.apache.hadoop.hbase.mapreduce.CopyTable, look into it, and do almost the same thing for your needs? JM 2013/3/20 samir das mohapatra samir.help...@gmail.com: Thanks, for reply I need to copy the hbase table into another cluster through the java code. Any example will help to me On Wed, Mar 20, 2013 at 8:48 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Samir, Is this what you are looking for? http://hbase.apache.org/book/ops_mgt.html#copytable What kind of help do you need? JM 2013/3/20 samir das mohapatra samir.help...@gmail.com: Hi All, Can you help me to copy one hbase table to another cluster hbase (Table copy) . Regards, samir
Re: How to pull Delta data from one cluster to another cluster ?
how to pull delta data that means filter data not whole data as off now i know we can do whole data through the distcp, colud you plese help if i am wrong or any other way to pull efficiently. like : get data based on filter condition. On Thu, Mar 14, 2013 at 3:43 PM, Mohammad Tariq donta...@gmail.com wrote: Use distcp. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Mar 14, 2013 at 3:40 PM, samir das mohapatra samir.help...@gmail.com wrote: Regards, samir.
Re: How to pull Delta data from one cluster to another cluster ?
will sqoop support inter-cluster data copy after filder. Senario: 1) cluster-1, cluster-2 2) taking data from cluster-1 to cluster-2 based on filter condition Will sqoop support ? On Thu, Mar 14, 2013 at 4:19 PM, Tariq donta...@gmail.com wrote: You can do that through Pig. samir das mohapatra samir.help...@gmail.com wrote: how to pull delta data that means filter data not whole data as off now i know we can do whole data through the distcp, colud you plese help if i am wrong or any other way to pull efficiently. like : get data based on filter condition. On Thu, Mar 14, 2013 at 3:43 PM, Mohammad Tariq donta...@gmail.comwrote: Use distcp. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Mar 14, 2013 at 3:40 PM, samir das mohapatra samir.help...@gmail.com wrote: Regards, samir. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Is there any way to get information from Hbase once some record get updated?
Hi All, Is there any way to get information from Hbase once some record get updated? , Like the Database Trigger. Regards, samir.
Re: How to shuffle (Key,Value) pair from mapper to multiple reducer
Use can use Custom Partitioner for that same. Regards, Samir. On Wed, Mar 13, 2013 at 2:29 PM, Vikas Jadhav vikascjadha...@gmail.comwrote: Hi I am specifying requirement again with example. I have use case where i need to shufffle same (key,value) pair to multiple reducers For Example we have pair (1,ABC) and two reducers (reducer0 and reducer1) are there then by default this pair will go to reduce1 (cause (key % numOfReducer) = (1%2) ) how i should shuffle this pair to both reducer. Also I willing to change the code of hadoop framework if Necessory. Thank you On Wed, Mar 13, 2013 at 12:51 PM, feng lu amuseme...@gmail.com wrote: Hi you can use Job#setNumReduceTasks(int tasks) method to set the number of reducer to output. On Wed, Mar 13, 2013 at 2:15 PM, Vikas Jadhav vikascjadha...@gmail.comwrote: Hello, As by default Hadoop framework can shuffle (key,value) pair to only one reducer I have use case where i need to shufffle same (key,value) pair to multiple reducers Also I willing to change the code of hadoop framework if Necessory. Thank you -- * * * Thanx and Regards* * Vikas Jadhav* -- Don't Grow Old, Grow Up... :-) -- * * * Thanx and Regards* * Vikas Jadhav*
Why hadoop is spawing two map over file size 1.5 KB ?
Hi All, I have very fundamental doubt, I have file having size 1.5KB and block size is default block size, But i could see two mapper it got creted during the Job. Could you please help to get whole picture why it is . Regards, samir.
Re: How can I record some position of context in Reduce()?
Through the RecordReader and FileStatus you can get it. On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy effyr...@gmail.com wrote: Hi,everyone, I want to join the k-v pairs in Reduce(),but how to get the record position? Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method. Any help will be appreciated. Thank you very much.
Re: Hadoop cluster hangs on big hive job
Problem I could see in you log file is , No available free map slot for job. I think you have to increase the block size to reduce the # of MAP , Bcz you are passing Big data as Input. The ideal approach is , first increase the 1) block size, 2) mapp site buffer 3) jvm re-use etc. regards, samir. On Fri, Mar 8, 2013 at 1:23 AM, Daning Wang dan...@netseer.com wrote: We have hive query processing zipped csv files. the query was scanning for 10 days(partitioned by date). data for each day around 130G. The problem is not consistent since if you run it again, it might go through. but the problem has never happened on the smaller jobs(like processing only one days data). We don't have space issue. I have attached log file when problem happening. it is stuck like following(just search 19706 of 49964) 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) Thanks, Daning On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård haavard.kongsga...@gmail.com wrote: hadoop logs? On 6. mars 2013 21:04, Daning Wang dan...@netseer.com wrote: We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while running big jobs. Basically all the nodes are dead, from that trasktracker's log looks it went into some kinds of loop forever. All the log entries like this when problem happened. Any idea how to debug the issue? Thanks in advance. 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_16_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_04_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_43_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0
Re: Need help optimizing reducer
Austin, I think you have to use partitioner to spawn more then one reducer for small data set. Default Partitioner will allow you only one reducer, you have to overwrite and implement you own logic to spawn more then one reducer. On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath austi...@gmail.com wrote: Hi all, I have 1 reducer and I have around 600 thousand unique keys coming to it. The total data is only around 30 mb. My logic doesn't allow me to have more than 1 reducer. It's taking too long to complete, around 2 hours. (till 66% it's fast then it slows down/ I don't really think it has started doing anything till 66% but then why does it show like that?). Are there any job execution parameters that can help improve reducer performace? Any suggestions to improve things when we have to live with just one reducer? thanks, Austin
Re: Issue with sqoop and HANA/ANY DB Schema name
Any help... On Fri, Mar 1, 2013 at 12:06 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I am facing one problem , how to specify the schema name before the table while executing the sqoop import statement. $ sqoop import --connect jdbc:sap://host:port/db_name --driver com.sap.db.jdbc.Driver --table SchemaName.Test-m 1 --username --password --target-dir /input/Test1 --verbose Note : Without schema name above sqoop import is working file but after assigning the schema name it is showing error Error Logs: Hi All, I am facing one problem , how to specify the schema name before the table while executing the sqoop import statement. $ sqoop import --connect jdbc:sap://host:port/db_name --driver com.sap.db.jdbc.Driver --table SchemaName.Test-m 1 --username --password --target-dir /input/Test1 --verbose Note : Without schema name above sqoop import is working file but after assigning the schema name it is showing error Error Logs: Regards, samir.
Re: Issue in Datanode (using CDH4.1.2)
few more things Same setup was working in Ubuntu machine(Dev cluster), only failing under CentOS 6.3(prod Cluster) On Thu, Feb 28, 2013 at 9:06 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I am facing on strange issue, That is In a cluster having 1k machine i could able to start and stop NN,DN,JT,TT,SSN. But the problem is under Name node Web-URL it is showing only one datanode . I tried to connect node through ssh also it was working file and i have assigned NNURL: port in core-site http://namenode:50070 Again I have checked with datanode logs, and I got the message like this: 2013-02-28 06:59:01,652 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoophost1/192.168.1.1:54310 2013-02-28 06:59:07,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoophost1/192.168.1.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) Regards, samir.
Issue with sqoop and HANA/ANY DB Schema name
Hi All, I am facing one problem , how to specify the schema name before the table while executing the sqoop import statement. $ sqoop import --connect jdbc:sap://host:port/db_name --driver com.sap.db.jdbc.Driver --table SchemaName.Test-m 1 --username --password --target-dir /input/Test1 --verbose Note : Without schema name above sqoop import is working file but after assigning the schema name it is showing error Error Logs: Hi All, I am facing one problem , how to specify the schema name before the table while executing the sqoop import statement. $ sqoop import --connect jdbc:sap://host:port/db_name --driver com.sap.db.jdbc.Driver --table SchemaName.Test-m 1 --username --password --target-dir /input/Test1 --verbose Note : Without schema name above sqoop import is working file but after assigning the schema name it is showing error Error Logs: Regards, samir.
How to use sqoop import
Hi All, Can any one share some example how to run sqoop Import results of SQL 'statement' ? for example: sqoop import -connect jdbc:. --driver xxx after this if i am specifying --query select statement it is even not recognizing as sqoop valid statement.. Regards, samir.
How to take Whole Database From RDBMS to HDFS Instead of Table/Table
Hi All, Using sqoop how to take entire database table into HDFS insted of Table by Table ?. How do you guys did it? Is there some trick? Regards, samir.
Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table
thanks all. On Wed, Feb 27, 2013 at 4:41 PM, Jagat Singh jagatsi...@gmail.com wrote: You might want to read this http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal On Wed, Feb 27, 2013 at 10:09 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Using sqoop how to take entire database table into HDFS insted of Table by Table ?. How do you guys did it? Is there some trick? Regards, samir.
Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table
Is it good way to take total 5PB data through the JAVA/JDBC Program ? On Wed, Feb 27, 2013 at 5:56 PM, Michel Segel michael_se...@hotmail.comwrote: I wouldn't use sqoop if you are taking everything. Simpler to write your own java/jdbc program that writes its output to HDFS. Just saying... Sent from a remote device. Please excuse any typos... Mike Segel On Feb 27, 2013, at 5:15 AM, samir das mohapatra samir.help...@gmail.com wrote: thanks all. On Wed, Feb 27, 2013 at 4:41 PM, Jagat Singh jagatsi...@gmail.com wrote: You might want to read this http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal On Wed, Feb 27, 2013 at 10:09 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Using sqoop how to take entire database table into HDFS insted of Table by Table ?. How do you guys did it? Is there some trick? Regards, samir.
Fwd: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)
-- Forwarded message -- From: samir das mohapatra samir.help...@gmail.com Date: Mon, Feb 25, 2013 at 3:05 PM Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch) To: cdh-u...@cloudera.org Hi All, I am getting bellow error , can any one help me on the same issue, ERROR LOG: -- hadoop@hadoophost2:~$ hadoop distcp hdfs:// 10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input 13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs:// 10.192.200.170:50070/tmp/samir.txt] 13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs:// 10.192.244.237:50070/input With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: hadoophost2/10.192.244.237; destination host is: bl1slu040.corp.adobe.com:50070; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) at org.apache.hadoop.ipc.Client.call(Client.java:1164) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73) at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752) at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813) Regards, samir
Re: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)
yes On Mon, Feb 25, 2013 at 3:30 PM, Nitin Pawar nitinpawar...@gmail.comwrote: does this match with your issue https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kIPOvrFaQE8 On Mon, Feb 25, 2013 at 3:20 PM, samir das mohapatra samir.help...@gmail.com wrote: -- Forwarded message -- From: samir das mohapatra samir.help...@gmail.com Date: Mon, Feb 25, 2013 at 3:05 PM Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch) To: cdh-u...@cloudera.org Hi All, I am getting bellow error , can any one help me on the same issue, ERROR LOG: -- hadoop@hadoophost2:~$ hadoop distcp hdfs:// 10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input 13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs:// 10.192.200.170:50070/tmp/samir.txt] 13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs:// 10.192.244.237:50070/input With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: hadoophost2/10.192.244.237; destination host is: bl1slu040.corp.adobe.com:50070; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) at org.apache.hadoop.ipc.Client.call(Client.java:1164) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73) at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752) at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813) Regards, samir -- Nitin Pawar
Re: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch)
I am using CDH4.1.2 with MRv1 not YARN. On Mon, Feb 25, 2013 at 3:47 PM, samir das mohapatra samir.help...@gmail.com wrote: yes On Mon, Feb 25, 2013 at 3:30 PM, Nitin Pawar nitinpawar...@gmail.comwrote: does this match with your issue https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kIPOvrFaQE8 On Mon, Feb 25, 2013 at 3:20 PM, samir das mohapatra samir.help...@gmail.com wrote: -- Forwarded message -- From: samir das mohapatra samir.help...@gmail.com Date: Mon, Feb 25, 2013 at 3:05 PM Subject: ISSUE IN CDH4.1.2 : transfer data between different HDFS clusters.(using distch) To: cdh-u...@cloudera.org Hi All, I am getting bellow error , can any one help me on the same issue, ERROR LOG: -- hadoop@hadoophost2:~$ hadoop distcp hdfs:// 10.192.200.170:50070/tmp/samir.txt hdfs://10.192.244.237:50070/input 13/02/25 01:34:36 INFO tools.DistCp: srcPaths=[hdfs:// 10.192.200.170:50070/tmp/samir.txt] 13/02/25 01:34:36 INFO tools.DistCp: destPath=hdfs:// 10.192.244.237:50070/input With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: hadoophost2/10.192.244.237; destination host is: bl1slu040.corp.adobe.com:50070; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) at org.apache.hadoop.ipc.Client.call(Client.java:1164) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1507) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1257) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73) at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:213) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752) at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:882) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813) Regards, samir -- Nitin Pawar
Re: ISSUE :Hadoop with HANA using sqoop
) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near .: line 1 col 46 (at pos 46) at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.createException(SQLExceptionSapDB.java:334) at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.generateDatabaseException(SQLExceptionSapDB.java:174) at com.sap.db.jdbc.packet.ReplyPacket.buildExceptionChain(ReplyPacket.java:103) at com.sap.db.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:848) at com.sap.db.jdbc.CallableStatementSapDB.sendCommand(CallableStatementSapDB.java:1874) at com.sap.db.jdbc.StatementSapDB.sendSQL(StatementSapDB.java:945) at com.sap.db.jdbc.CallableStatementSapDB.doParse(CallableStatementSapDB.java:230) at com.sap.db.jdbc.CallableStatementSapDB.constructor(CallableStatementSapDB.java:190) at com.sap.db.jdbc.CallableStatementSapDB.init(CallableStatementSapDB.java:101) at com.sap.db.jdbc.CallableStatementSapDBFinalize.init(CallableStatementSapDBFinalize.java:31) at com.sap.db.jdbc.ConnectionSapDB.prepareStatement(ConnectionSapDB.java:1088) at com.sap.db.jdbc.trace.Connection.prepareStatement(Connection.java:347) at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:101) at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:236) ... 12 more 2013-02-20 23:10:23,906 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task On Thu, Feb 21, 2013 at 12:03 PM, Harsh J ha...@cloudera.com wrote: The error is truncated, check the actual failed task's logs for complete info: Caused by: com.sap… what? Seems more like a SAP side fault than a Hadoop side one and you should ask on their forums with the stacktrace posted. On Thu, Feb 21, 2013 at 11:58 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All Can you plese tell me why I am getting error while loading data from SAP HANA to Hadoop HDFS using sqoop (4.1.2). Error Log: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:458) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: com.sap Regards, samir. -- -- Harsh J
Fwd: Delivery Status Notification (Failure)
Hi All, I wanted to know how to connect Hive(hadoop-cdh4 distribution) with MircoStrategy Any help is very helpfull. Witing for you response Note: It is little bit urgent do any one have exprience in that Thanks, samir
Re: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with hive-0.9.0-cdh4.1.2)
Hi Suresh, Thanks for advice, why you are so monopoly, You shoul not be. Problem is solution not problem. Note: I am looking for any user does not matter bcz it is common use Scenario. On Fri, Feb 8, 2013 at 3:31 AM, Suresh Srinivas sur...@hortonworks.comwrote: Please only use CDH mailing list and do not copy this to hdfs-user. On Thu, Feb 7, 2013 at 7:20 AM, samir das mohapatra samir.help...@gmail.com wrote: Any Suggestion... On Thu, Feb 7, 2013 at 4:17 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I could not see the hive meta store DB under Mysql database Under mysql user hadoop. Example: $ mysql –u root -p $ Add hadoop user (CREATE USER ‘hadoop'@'localhost' IDENTIFIED BY ‘ hadoop';) $GRANT ALL ON *.* TO ‘hadoop'@‘% IDENTIFIED BY ‘hadoop’ $ Example (GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'localhost' IDENTIFIED BY 'hadoop' WITH GRANT OPTION;) Bellow configuration i am follwing property namejavax.jdo.option.ConnectionURL/name valuejdbc:mysql://localhost:3306/hadoop?createDatabaseIfNotExist=true/value /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.mysql.jdbc.Driver/value /property property namejavax.jdo.option.ConnectionUserName/name valuehadoop/value /property property namejavax.jdo.option.ConnectionPassword/name valuehadoop/value /property Note: Previously i was using cdh3 it was perfectly creating under mysql metastore DB but when i changed cdh3 to cdh4.1.2 with hive as above subject line , It is not creating. Any suggestiong.. Regrads, samir. -- http://hortonworks.com/download/
All MAP Jobs(Java Custom Map Reduce Program) are assigned to one Node why?
Hi All, I am using cdh4 with MRv1 . When I am running any hadoop mapreduce program from java , all the map task is assigned to one node. It suppose to distribute the map task among the cluster's nodes. Note : 1) My jobtracker web-UI is showing 500 nodes 2) when it is comming to reducer , then it is sponning into other node (other then map node) Can nay one guide me why it is like so Regards, samir.
How to Integrate MicroStrategy with Hadoop
Hi All, I wanted to know how to connect HAdoop with MircoStrategy Any help is very helpfull. Witing for you response Note: Any Url and Example will be really help full for me. Thanks, samir
How to Integrate SAP HANA WITH Hadoop
Hi all I we need the connectivity of SAP HANA with Hadoop, Do you have any experience with that can you please share some documents and example with me ,so that it will be really help full for me thanks, samir
Re: How to Integrate MicroStrategy with Hadoop
We are using coludera Hadoop On Thu, Jan 31, 2013 at 2:12 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I wanted to know how to connect HAdoop with MircoStrategy Any help is very helpfull. Witing for you response Note: Any Url and Example will be really help full for me. Thanks, samir
Recommendation required for Right Hadoop Distribution (CDH OR HortonWork)
Hi All, My Company wanted to implement right Distribution for Apache Hadoop for its Production as well as Dev. Can any one suggest me which one will good for future. Hints: They wanted to know both pros and cons. Regards, samir.
Re: What is the best way to load data from one cluster to another cluster (Urgent requirement)
thanks all. On Thu, Jan 31, 2013 at 11:19 AM, Satbeer Lamba satbeer.la...@gmail.comwrote: I might be wrong but have you considered distcp? On Jan 31, 2013 11:15 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Any one knows, how to load data from one hadoop cluster(CDH4) to another Cluster (CDH4) . They way our project needs are 1) It should be delta load or incremental load. 2) It should be based on the timestamp 3) Data volume are 5PB Any Help Regards, samir.
Re: Hadoop Nutch Mkdirs failed to create file
just try to apply $chmod 755 -R /home/wj/apps/apache-nutch-1.6 then try after it. On Wed, Jan 23, 2013 at 9:23 PM, 吴靖 qhwj2...@126.com wrote: hi, everyone! I want use the nutch to crawl the web pages, but problem comes as the log like, I think it maybe some permissions problem,but i am not sure. Any help will be appreciated, think you 2013-01-23 07:37:21,809 ERROR mapred.FileOutputCommitter - Mkdirs failed to create file :/home/wj/apps/apache-nutch-1.6/bin/crawl/crawldb/190684692/_temporary 2013-01-23 07:37:24,836 WARN mapre d.LocalJobRunner - job_local_0002 java.io.IOException: The temporary job-output directory file:/home/wj/apps/apache-nutch-1.6/bin/crawl/crawldb/190684692/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244) at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.init(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490) ** at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Re: different input/output formats
PFA. On Wed, May 30, 2012 at 2:45 AM, Mark question markq2...@gmail.com wrote: Hi Samir, can you email me your main class.. or if you can check mine, it is as follows: public class SortByNorm1 extends Configured implements Tool { @Override public int run(String[] args) throws Exception { if (args.length != 2) { System.err.printf(Usage:bin/hadoop jar norm1.jar inputDir outputDir\n); ToolRunner.printGenericCommandUsage(System.err); return -1; } JobConf conf = new JobConf(new Configuration(),SortByNorm1.class); conf.setJobName(SortDocByNorm1); conf.setMapperClass(Norm1Mapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setReducerClass(Norm1Reducer.class); conf.setOutputKeyClass(FloatWritable.class); conf.setOutputValueClass(Text.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(SequenceFileOutputFormat.class); TextInputFormat.addInputPath(conf, new Path(args[0])); SequenceFileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); return 0; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new SortByNorm1(), args); System.exit(exitCode); } On Tue, May 29, 2012 at 1:55 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi Mark See the out put for that same Application . I am not getting any error. On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com wrote: Hi guys, this is a very simple program, trying to use TextInputFormat and SequenceFileoutputFormat. Should be easy but I get the same error. Here is my configurations: conf.setMapperClass(myMapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setOutputKeyClass(FloatWritable.class); conf.setOutputValueClass(Text.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(SequenceFileOutputFormat.class); TextInputFormat.addInputPath(conf, new Path(args[0])); SequenceFileOutputFormat.setOutputPath(conf, new Path(args[1])); myMapper class is: public class myMapper extends MapReduceBase implements MapperLongWritable,Text,FloatWritable,Text { public void map(LongWritable offset, Text val,OutputCollectorFloatWritable,Text output, Reporter reporter) throws IOException { output.collect(new FloatWritable(1), val); } } But I get the following error: 12/05/29 12:54:31 INFO mapreduce.Job: Task Id : attempt_201205260045_0032_m_00_0, Status : FAILED java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.FloatWritable at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:705) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:508) at filter.stat.cosine.preprocess.SortByNorm1$Norm1Mapper.map(SortByNorm1.java:59) at filter.stat.cosine.preprocess.SortByNorm1$Norm1Mapper.map(SortByNorm1.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:397) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.Use Where is the writing of LongWritable coming from ?? Thank you, Mark
Re: How to mapreduce in the scenario
Yes . Hadoop Is only for Huge Dataset Computaion . May not good for small dataset. On Wed, May 30, 2012 at 6:53 AM, liuzhg liu...@cernet.com wrote: Hi, Mike, Nitin, Devaraj, Soumya, samir, Robert Thank you all for your suggestions. Actually, I want to know if hadoop has any advantage than routine database in performance for solving this kind of problem ( join data ). Best Regards, Gump On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee soumya.sbaner...@gmail.com wrote: Hi, You can also try to use the Hadoop Reduce Side Join functionality. Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and Reduce classes to do the same. Regards, Soumya. On Tue, May 29, 2012 at 4:10 PM, Devaraj k devara...@huawei.com wrote: Hi Gump, Mapreduce fits well for solving these types(joins) of problem. I hope this will help you to solve the described problem.. 1. Mapoutput key and value classes : Write a map out put key class(Text.class), value class(CombinedValue.class). Here value class should be able to hold the values from both the files(a.txt and b.txt) as shown below. class CombinedValue implements WritableComparator { String name; int age; String address; boolean isLeft; // flag to identify from which file } 2. Mapper : Write a map() function which can parse from both the files(a.txt, b.txt) and produces common output key and value class. 3. Partitioner : Write the partitioner in such a way that it will Send all the (key, value) pairs to same reducer which are having same key. 4. Reducer : In the reduce() function, you will receive the records from both the files and you can combine those easily. Thanks Devaraj From: liuzhg [liu...@cernet.com] Sent: Tuesday, May 29, 2012 3:45 PM To: common-user@hadoop.apache.org Subject: How to mapreduce in the scenario Hi, I wonder that if Hadoop can solve effectively the question as following: == input file: a.txt, b.txt result: c.txt a.txt: id1,name1,age1,... id2,name2,age2,... id3,name3,age3,... id4,name4,age4,... b.txt: id1,address1,... id2,address2,... id3,address3,... c.txt id1,name1,age1,address1,... id2,name2,age2,address2,... I know that it can be done well by database. But I want to handle it with hadoop if possible. Can hadoop meet the requirement? Any suggestion can help me. Thank you very much! Best Regards, Gump
Re: Small glitch with setting up two node cluster...only secondary node starts (datanode and namenode don't show up in jps)
In your logs details i colud not find the NN stating. It is the Problem of NN itself. Harsh also suggested for that same. On Sun, May 27, 2012 at 10:51 PM, Rohit Pandey rohitpandey...@gmail.comwrote: Hello Hadoop community, I have been trying to set up a double node Hadoop cluster (following the instructions in - http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ ) and am very close to running it apart from one small glitch - when I start the dfs (using start-dfs.sh), it says: 10.63.88.53: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-pandro51-datanode-ubuntu.out 10.63.88.109: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-pandro51-datanode-pandro51-OptiPlex-960.out 10.63.88.109: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-pandro51-secondarynamenode-pandro51-OptiPlex-960.out starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-pandro51-jobtracker-pandro51-OptiPlex-960.out 10.63.88.109: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-pandro51-tasktracker-pandro51-OptiPlex-960.out 10.63.88.53: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-pandro51-tasktracker-ubuntu.out which looks like it's been successful in starting all the nodes. However, when I check them out by running 'jps', this is what I see: 27531 SecondaryNameNode 27879 Jps As you can see, there is no datanode and name node. I have been racking my brains at this for quite a while now. Checked all the inputs and every thing. Any one know what the problem might be? -- Thanks in advance, Rohit
Re: Small glitch with setting up two node cluster...only secondary node starts (datanode and namenode don't show up in jps)
*Step wise Details (Ubantu 10.x version ): Go through properly and Run one by one. it will sove your problem (You can change the path,IP ,Host name as you like to do)* - 1. Start the terminal 2. Disable ipv6 on all machines pico /etc/sysctl.conf 10. Download and install hadoop: 3. Add these files to the EOF cd /usr/local/hadoop net.ipv6.conf.all.disable_ipv6 = 1 sudo wget –c http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u2.tar.gz net.ipv6.conf.default.disable_ipv6 = 1 11. Unzip the tar net.ipv6.conf.lo.disable_ipv6 = 1 sudo tar -zxvf /usr/local/hadoop/hadoop-0.20.2-chd3u2.tar.gz net.ipv6.conf.lo.disable_ipv6 = 1 12. Change permissions on hadoop folder by granting all to hadoop 3. Reboot the system sudo chown -R hadoop:hadoop /usr/local/hadoop sudo reboot sudo chmod 750 -R /usr/local/hadoop 4. Install java 13. Create the HDFS directory sudo apt-get install openjdk-6-jdk openjdk-6-jre sudo mkdir hadoop-datastore // inside the usr local hadoop folder 5. Check if ssh is installed, if not do so: sudo mkdir hadoop-datastore/hadoop-hadoop sudo apt-get install openssh-server openssh-client 14. Add the binaries path and hadoop home in the environment file 6. Create a group and user called hadoop sudo pico /etc/environment sudo addgroup hadoop set the bin path as well as hadoop home path sudo adduser --ingroup hadoop hadoop source /etc/environment 7. Assign all the permissions to the Hadoop user 15. Configure the hadoop env.sh file sudo visudo cd /usr/local/hadoop/hadoop-0.20.2-cdh3u3/ Add the following line in the file sudo pico conf/hadoop-env.sh hadoop ALL =(ALL) ALL add the following line in there: 8. Check if hadoop user has ssh installed export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true su hadoop export JAVA_HOME=/usr/lib/jvm/java-6-openjdk ssh-keygen -t rsa -P next page Press Enter when asked. cat $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys ssh localhost Copy the servers RSA public key from server to all nodes in the authorized_keys file as shown in the above step 9. Make hadoop installation directory: sudo mkdir /usr/local/ 10. Download and install hadoop: cd /usr/local/hadoop sudo wget –c http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u2.tar.gz 11. Unzip the tar sudo tar -zxvf /usr/local/hadoop/hadoop-0.20.2-chd3u2.tar.gz 12. Change permissions on hadoop folder by granting all to hadoop sudo chown -R hadoop:hadoop /usr/local/hadoop sudo chmod 750 -R /usr/local/hadoop 13. Create the HDFS directory sudo mkdir hadoop-datastore // inside the usr local hadoop folder sudo mkdir hadoop-datastore/hadoop-hadoop 14. Add the binaries path and hadoop home in the environment file sudo pico /etc/environment // set the bin path as well as hadoop home path source /etc/environment 15. Configure the hadoop env.sh file cd /usr/local/hadoop/hadoop-0.20.2-cdh3u3/ sudo pico conf/hadoop-env.sh //add the following line in there: export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true export JAVA_HOME=/usr/lib/jvm/java-6-openjdk 16. Configuring the core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration property namehadoop.tmp.dir/name value/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://IP of namenode:54310/value descriptionLocation of the Namenode/description /property /configuration 17. Configuring the hdfs-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration property namedfs.replication/name value2/value descriptionDefault block replication./description /property /configuration 18. Configuring the mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration property namemapred.job.tracker/name valueIP of job tracker:54311/value descriptionHost and port of the jobtracker. /description /property /configuration 19. Add all the IP addresses in the conf/slaves file sudo pico /usr/local/hadoop/hadoop-0.20.2-cdh3u2/conf/slaves Add the list of IP addresses that will host data nodes, in this file - *Hadoop Commands: Now restart the hadoop cluster* start-all.sh/stop-all.sh start-dfs.sh/stop-dfs.sh start-mapred.sh/stop-mapred.sh hadoop dfs -ls /virtual dfs path hadoop dfs copyFromLocal local path dfs path
Re: different input/output formats
Hi I think attachment will not got thgrough the common-user@hadoop.apache.org. Ok Please have a look bellow. MAP package test; import java.io.IOException; import org.apache.hadoop.io.FloatWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; public class myMapper extends MapReduceBase implements MapperLongWritable,Text,FloatWritable,Text { public void map(LongWritable offset, Text val,OutputCollectorFloatWritable,Text output, Reporter reporter) throws IOException { output.collect(new FloatWritable(1), val); } } REDUCER -- Prepare reducer what exactly you want for. JOB package test; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.FloatWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.SequenceFileOutputFormat; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class TestDemo extends Configured implements Tool{ public static void main(String args[]) throws Exception{ int res = ToolRunner.run(new Configuration(), new TestDemo(),args); System.exit(res); } @Override public int run(String[] args) throws Exception { JobConf conf = new JobConf(TestDemo.class); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); conf.setJobName(TestCustomInputOutput); conf.setMapperClass(myMapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setOutputKeyClass(FloatWritable.class); conf.setOutputValueClass(Text.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(SequenceFileOutputFormat.class); TextInputFormat.addInputPath(conf, new Path(args[0])); SequenceFileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); return 0; } } On Wed, May 30, 2012 at 6:57 PM, samir das mohapatra samir.help...@gmail.com wrote: PFA. On Wed, May 30, 2012 at 2:45 AM, Mark question markq2...@gmail.comwrote: Hi Samir, can you email me your main class.. or if you can check mine, it is as follows: public class SortByNorm1 extends Configured implements Tool { @Override public int run(String[] args) throws Exception { if (args.length != 2) { System.err.printf(Usage:bin/hadoop jar norm1.jar inputDir outputDir\n); ToolRunner.printGenericCommandUsage(System.err); return -1; } JobConf conf = new JobConf(new Configuration(),SortByNorm1.class); conf.setJobName(SortDocByNorm1); conf.setMapperClass(Norm1Mapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setReducerClass(Norm1Reducer.class); conf.setOutputKeyClass(FloatWritable.class); conf.setOutputValueClass(Text.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(SequenceFileOutputFormat.class); TextInputFormat.addInputPath(conf, new Path(args[0])); SequenceFileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); return 0; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new SortByNorm1(), args); System.exit(exitCode); } On Tue, May 29, 2012 at 1:55 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi Mark See the out put for that same Application . I am not getting any error. On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com wrote: Hi guys, this is a very simple program, trying to use TextInputFormat and SequenceFileoutputFormat. Should be easy but I get the same error. Here is my configurations: conf.setMapperClass(myMapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setOutputKeyClass
Re: How to Integrate LDAP in Hadoop ?
It is cloudera version .20 On Tue, May 29, 2012 at 4:14 PM, Michel Segel michael_se...@hotmail.comwrote: Which release? Version? I believe there are variables in the *-site.xml that allow LDAP integration ... Sent from a remote device. Please excuse any typos... Mike Segel On May 26, 2012, at 7:40 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Did any one work on hadoop with LDAP integration. Please help me for same. Thanks samir
Re: How to mapreduce in the scenario
Yes it is possible by using MultipleInputs format to multiple mapper (basically 2 different mapper) Setp: 1 MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class, *Mapper1.class*); MultipleInputs.addInputPath(conf, new Path(args[1]), TextInputFormat.class, *Mapper2.class*); while defining two mappers value put some identifier (*output.collect(new Text(key), new Text(*identifier+~ *+value));*) related to a.txt and b.txt so that it will easy to distinct two file mapper output within the reducer. Step 2: put b.txt in the distcach and compare the reducer value against the b.txt List String currValue = values.next().toString(); String valueSplitted[] = currValue.split(~); if(valueSplitted[0].equals(A)) // A:- Identifier from A mapper { //where process A file } else if(valueSplitted[0].equals(B)) //B:- Identifier from B mapper { //here process B file } output.collect(new Text(key), new Text(Formated Value as like you to display)); Decide the key as like what you want to produce the result. After that you have to use one reducer to perform the ouput. thanks samir On Tue, May 29, 2012 at 3:45 PM, liuzhg liu...@cernet.com wrote: Hi, I wonder that if Hadoop can solve effectively the question as following: == input file: a.txt, b.txt result: c.txt a.txt: id1,name1,age1,... id2,name2,age2,... id3,name3,age3,... id4,name4,age4,... b.txt: id1,address1,... id2,address2,... id3,address3,... c.txt id1,name1,age1,address1,... id2,name2,age2,address2,... I know that it can be done well by database. But I want to handle it with hadoop if possible. Can hadoop meet the requirement? Any suggestion can help me. Thank you very much! Best Regards, Gump
Re: different input/output formats
Hi Mark public void map(LongWritable offset, Text val,OutputCollector FloatWritable,Text output, Reporter reporter) throws IOException { output.collect(new FloatWritable(*1*), val); *//chanage 1 to 1.0f then it will work.* } let me know the status after the change On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com wrote: Hi guys, this is a very simple program, trying to use TextInputFormat and SequenceFileoutputFormat. Should be easy but I get the same error. Here is my configurations: conf.setMapperClass(myMapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setOutputKeyClass(FloatWritable.class); conf.setOutputValueClass(Text.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(SequenceFileOutputFormat.class); TextInputFormat.addInputPath(conf, new Path(args[0])); SequenceFileOutputFormat.setOutputPath(conf, new Path(args[1])); myMapper class is: public class myMapper extends MapReduceBase implements MapperLongWritable,Text,FloatWritable,Text { public void map(LongWritable offset, Text val,OutputCollectorFloatWritable,Text output, Reporter reporter) throws IOException { output.collect(new FloatWritable(1), val); } } But I get the following error: 12/05/29 12:54:31 INFO mapreduce.Job: Task Id : attempt_201205260045_0032_m_00_0, Status : FAILED java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.FloatWritable at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:705) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:508) at filter.stat.cosine.preprocess.SortByNorm1$Norm1Mapper.map(SortByNorm1.java:59) at filter.stat.cosine.preprocess.SortByNorm1$Norm1Mapper.map(SortByNorm1.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:397) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.Use Where is the writing of LongWritable coming from ?? Thank you, Mark
Re: different input/output formats
Hi Mark See the out put for that same Application . I am not getting any error. On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com wrote: Hi guys, this is a very simple program, trying to use TextInputFormat and SequenceFileoutputFormat. Should be easy but I get the same error. Here is my configurations: conf.setMapperClass(myMapper.class); conf.setMapOutputKeyClass(FloatWritable.class); conf.setMapOutputValueClass(Text.class); conf.setNumReduceTasks(0); conf.setOutputKeyClass(FloatWritable.class); conf.setOutputValueClass(Text.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(SequenceFileOutputFormat.class); TextInputFormat.addInputPath(conf, new Path(args[0])); SequenceFileOutputFormat.setOutputPath(conf, new Path(args[1])); myMapper class is: public class myMapper extends MapReduceBase implements MapperLongWritable,Text,FloatWritable,Text { public void map(LongWritable offset, Text val,OutputCollectorFloatWritable,Text output, Reporter reporter) throws IOException { output.collect(new FloatWritable(1), val); } } But I get the following error: 12/05/29 12:54:31 INFO mapreduce.Job: Task Id : attempt_201205260045_0032_m_00_0, Status : FAILED java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.FloatWritable at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:998) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:705) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:508) at filter.stat.cosine.preprocess.SortByNorm1$Norm1Mapper.map(SortByNorm1.java:59) at filter.stat.cosine.preprocess.SortByNorm1$Norm1Mapper.map(SortByNorm1.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:397) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.Use Where is the writing of LongWritable coming from ?? Thank you, Mark
How to configure application for Eternal jar
Hi All, How to configure the external jar , which is use by application internally. For eample: JDBC ,Hive Driver etc. Note:- I dont have permission to start and stop the hadoop machine. So I need to configure application level (Not hadoop level ) If we will put jar inside the lib folder of the hadoop then i think we need to re-start the hadoop without this, is there any other way to do so. Thanks samir
Re: Right way to implement MR ?
Thanks Harsh J for your help. On Thu, May 24, 2012 at 1:24 AM, Harsh J ha...@cloudera.com wrote: Samir, You can use MultipleInputs for multiple forms of inputs per mapper (with their own input K/V types, but common output K/V types) with a common reduce-side join/compare. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html . On Thu, May 24, 2012 at 1:17 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, How to compare to input file In M/R Job. let A Log file around 30GB and B Log file size is around 60 GB I wanted to know how i will define K,V inside the mapper. Thanks samir. -- Harsh J
Re: RemoteException writing files
Hi This Could be due to the Following reason 1) The *NameNode http://wiki.apache.org/hadoop/NameNode* does not have any available DataNodes 2) Namenode not able to start properly 3) other wise some IP Issue . Note:- Pleaes mention localhost instead of 127.0.0.1 (If it is in local) Follow URL: http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F Thanks samir On Sat, May 19, 2012 at 8:59 PM, Todd McFarland toddmcf2...@gmail.comwrote: Hi folks, (Resending to this group, sent to common-dev before, pretty sure that's for Hadoop internal development - sorry for that..) I'm pretty stuck here. I've been researching for hours and I haven't made any forward progress on this one. I have a vmWare installation of Cloudera Hadoop 0.20. The following commands to create a directory and copy a file from the shared folder *work fine*, so I'm confident everything is setup correctly: [cloudera@localhost bin]$ hadoop fs -mkdir /user/cloudera/testdir [cloudera@localhost bin]$ hadoop fs -put /mnt/hgfs/shared_folder/file1.txt /user/cloudera/testdir/file1.txt The file shows up fine in the HDFS doing it this way on the Linux VM. *However*, when I try doing the equivalent operation in Java everything works great until I try to close() FSDataOutputStream. I'm left with the new directory and a zero byte size file. One suspicious thing is that the user is admin instead of cloudera which I haven't figured out why. Here is the error: 12/05/19 09:45:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream 127.0.0.1:50010 java.net.ConnectException: Connection refused: no further information 12/05/19 09:45:46 INFO hdfs.DFSClient: Abandoning block blk_1931357292676354131_1068 12/05/19 09:45:46 INFO hdfs.DFSClient: Excluding datanode 127.0.0.1:50010 12/05/19 09:45:46 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/admin/testdir/file1.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1533) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:667) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) There are certainly lots of search references to *could only be replicated to 0 nodes, instead of 1* but chasing down those suggestions hasn't helped. I have run *jps* and* netstat* and that looks good. All services are running, all port seem to be good. The *health check* looks good, plenty of disk space, no failed nodes... Here is the java (it fails when it hits fs.close(): import java.io.BufferedReader; import java.io.FileInputStream; import java.io.FileReader; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class TestFileTrans { public static void main(String[] args) { Configuration config = new Configuration(); config.addResource(new Path(c:/_bigdata/client_libs/core-site.xml)); config.addResource(new Path(c:/_bigdata/client_libs/hdfs-site.xml)); System.out.println(hadoop.tmp.dir: + config.get(hadoop.tmp.dir)); try{ FileSystem dfs = FileSystem.get(config); // this will default to admin unless the workingDirectory is explicitly set.. System.out.println(HDFS Working Directory: + dfs.getWorkingDirectory().toString()); String dirName = testdir; Path src = new Path(dfs.getWorkingDirectory()+/+dirName); dfs.mkdirs(src); System.out.println(HDFS Directory created: + dfs.getWorkingDirectory().toString()); loadFile(dfs, src); }catch(IOException e){ System.out.println(Error + e.getMessage()); } } private static void loadFile(FileSystem dfs, Path src) throws IOException{ FileInputStream fis = new FileInputStream(c:/_bigdata/shared_folder/file1.txt); int len = fis.available(); byte[] btr = new byte[len]; fis.read(btr); FSDataOutputStream fs = dfs.create(new Path(src.toString() +/file1.txt)); fs.write(btr); fs.flush(); fs.close(); } } Any help would be greatly appreciated!
Re: RemoteException writing files
Hi This Could be due to the Following reason 1) The *NameNode http://wiki.apache.org/hadoop/NameNode* does not have any available DataNodes 2) Namenode not able to start properly 3) other wise some IP Issue . Note:- Pleaes mention localhost instead of 127.0.0.1 (If it is in local) Follow URL: http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F Thanks samir On Sat, May 19, 2012 at 11:30 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi This Could be due to the Following reason 1) The *NameNode http://wiki.apache.org/hadoop/NameNode* does not have any available DataNodes 2) Namenode not able to start properly 3) other wise some IP Issue . Note:- Pleaes mention localhost instead of 127.0.0.1 (If it is in local) Follow URL: http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F Thanks samir On Sat, May 19, 2012 at 8:59 PM, Todd McFarland toddmcf2...@gmail.comwrote: Hi folks, (Resending to this group, sent to common-dev before, pretty sure that's for Hadoop internal development - sorry for that..) I'm pretty stuck here. I've been researching for hours and I haven't made any forward progress on this one. I have a vmWare installation of Cloudera Hadoop 0.20. The following commands to create a directory and copy a file from the shared folder *work fine*, so I'm confident everything is setup correctly: [cloudera@localhost bin]$ hadoop fs -mkdir /user/cloudera/testdir [cloudera@localhost bin]$ hadoop fs -put /mnt/hgfs/shared_folder/file1.txt /user/cloudera/testdir/file1.txt The file shows up fine in the HDFS doing it this way on the Linux VM. *However*, when I try doing the equivalent operation in Java everything works great until I try to close() FSDataOutputStream. I'm left with the new directory and a zero byte size file. One suspicious thing is that the user is admin instead of cloudera which I haven't figured out why. Here is the error: 12/05/19 09:45:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream 127.0.0.1:50010 java.net.ConnectException: Connection refused: no further information 12/05/19 09:45:46 INFO hdfs.DFSClient: Abandoning block blk_1931357292676354131_1068 12/05/19 09:45:46 INFO hdfs.DFSClient: Excluding datanode 127.0.0.1:50010 12/05/19 09:45:46 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/admin/testdir/file1.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1533) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:667) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) There are certainly lots of search references to *could only be replicated to 0 nodes, instead of 1* but chasing down those suggestions hasn't helped. I have run *jps* and* netstat* and that looks good. All services are running, all port seem to be good. The *health check* looks good, plenty of disk space, no failed nodes... Here is the java (it fails when it hits fs.close(): import java.io.BufferedReader; import java.io.FileInputStream; import java.io.FileReader; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class TestFileTrans { public static void main(String[] args) { Configuration config = new Configuration(); config.addResource(new Path(c:/_bigdata/client_libs/core-site.xml)); config.addResource(new Path(c:/_bigdata/client_libs/hdfs-site.xml)); System.out.println(hadoop.tmp.dir: + config.get(hadoop.tmp.dir)); try{ FileSystem dfs = FileSystem.get(config); // this will default to admin unless the workingDirectory is explicitly set.. System.out.println(HDFS Working Directory: + dfs.getWorkingDirectory().toString()); String dirName = testdir; Path src = new Path(dfs.getWorkingDirectory()+/+dirName); dfs.mkdirs(src); System.out.println(HDFS Directory created: + dfs.getWorkingDirectory().toString()); loadFile(dfs, src); }catch(IOException e){ System.out.println(Error + e.getMessage()); } } private static void loadFile(FileSystem dfs, Path src) throws IOException{ FileInputStream fis = new FileInputStream(c:/_bigdata/shared_folder/file1.txt); int len = fis.available
Re: hadoop File loading
HI, Your requirment is that your M/R will use full xml file while operating. (If it is write then please one of the approach bellow) So you can put this xml file in DistrubutedChache which will shared accross the M/R . So that your will get whole xml instead of chunk of data. Thanks Samir On Tue, May 15, 2012 at 11:30 PM, @dataElGrande markydale...@gmail.comwrote: You should check out Pentaho's howto's dealing with Hadoop and MapReducer. Hope this helps! http://wiki.pentaho.com/display/BAD/How+To%27s hari708 wrote: Hi, I have a big file consisting of XML data.the XML is not represented as a single line in the file. if we stream this file using ./hadoop dfs -put command to a hadoop directory .How the distribution happens.? Basically in My mapreduce program i am expecting a complete XML as my input.i have a CustomReader(for XML) in my mapreduce job configuration.My main confusion is if namenode distribute data to DataNodes ,there is a chance that a part of xml can go to one data node and other half can go in another datanode.If that is the case will my custom XMLReader in the mapreduce be able to combine it(as mapreduce reads data locally only). Please help me on this? -- View this message in context: http://old.nabble.com/hadoop-File-loading-tp32871902p33849683.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Moving files from JBoss server to HDFS
Hi financeturd financet...@yahoo.com, My Point of view second step like bellow is the good approach {Separate server} -- {JBoss server} and then {Separate server} -- HDFS thanks samir On Sat, May 12, 2012 at 6:00 AM, financeturd financeturd financet...@yahoo.com wrote: Hello, We have a large number of custom-generated files (not just web logs) that we need to move from our JBoss servers to HDFS. Our first implementation ran a cron job every 5 minutes to move our files from the output directory to HDFS. Is this recommended? We are being told by our IT team that our JBoss servers should not have access to HDFS for security reasons. The files must be sucked to HDFS by other servers that do not accept traffic from the outside. In essence, they are asking for a layer of indirection. Instead of: {JBoss server} -- {HDFS} it's being requested that it look like: {Separate server} -- {JBoss server} and then {Separate server} -- HDFS While I understand in principle what is being said, the security of having processes on JBoss servers writing files to HDFS doesn't seem any worse than having Tomcat servers access a central database, which they do. Can anyone comment on what a recommended approach would be? Should our JBoss servers push their data to HDFS or should the data be pulled by another server and then placed into HDFS? Thank you! FT
Re: java.io.IOException: Task process exit with nonzero status of 1
Hi Mohit, 1) Hadoop is more portable with Linux,Ubantu or any non dos file system. but you are running hadoop on window it colud be the problem bcz hadoop will generate some partial out put file for temporary use. 2) Another thing is that your are running hadoop version as 0.19 , I think if you upgrade the version it will solve your problem. why bcz example what exactly you are using it is having some problem with FileRead and Write with Window OS. 3) Check your input file data bcz i could see your mapper is also 0% 4) If your are all right with whole scenario . please could your share your logs under hadoopversion/logs there it self we can trace it very clearly. Thanks SAMIR On Fri, May 11, 2012 at 12:26 PM, Mohit Kundra mohit@gmail.com wrote: Hi , I am new user to hadoop . I have installed hadoop0.19.1 on single windows machine. Its http://localhost:50030/jobtracker.jsp and http://localhost:50070/dfshealth.jsp pages are working fine but when i am executing bin/hadoop jar hadoop-0.19.1-examples.jar pi 5 100 It is showing below $ bin/hadoop jar hadoop-0.19.1-examples.jar pi 5 100 cygpath: cannot create short name of D:hadoop-0.19.1logs Number of Maps = 5 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 12/05/11 12:07:26 INFO mapred.JobClient: Running job: job_20120513_0002 12/05/11 12:07:27 INFO mapred.JobClient: map 0% reduce 0% 12/05/11 12:07:35 INFO mapred.JobClient: Task Id : attempt_20120513_0002_m_06_ 0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run (TaskRunner.java:425) Please tell me what is the root cause regards , Mohit