java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable
I use hbase-0.94 and hadoop-2.2, there is below exception: 2014-07-04 12:43:49,700 WARN org.apache.hadoop.hdfs.DFSClient: failed to connect to DomainSocket(fd=322,path=/home/hadoop/hadoop-current/cdh4-dn-socket/dn_socket) java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable at org.apache.hadoop.net.unix.DomainSocket.readArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$200(DomainSocket.java:47) at org.apache.hadoop.net.unix.DomainSocket$DomainInputStream.read(DomainSocket.java:530) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169) at org.apache.hadoop.hdfs.BlockReaderFactory.newShortCircuitBlockReader(BlockReaderFactory.java:187) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:104) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1060) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:898) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1148) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:73) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1388) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1880) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1723) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:365) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:633) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:730) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:128) why does appear the exception java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable? Thanks, LiuLei
hdfs cache
I use hadoop-2.4, I want use the hdfs cache function. I use ulimit -l 32212254720 linux command to set size of max locked memory, but there is below error: ulimit -l 322 -bash: ulimit: max locked memory: cannot modify limit: Operation not permitted How can I set size of max locked memory? Thanks, LiuLei
heterogeneous storages in HDFS
On April 11 hadoop-2.4 is released, the hadoop-2.4 does not include heterogeneous storages function, when does hadoop include the function? Thanks, LiuLei
Re: heterogeneous storages in HDFS
When is hadoop released? 2014-04-14 17:04 GMT+08:00 Stanley Shi s...@gopivotal.com: Please find it in this page: https://wiki.apache.org/hadoop/Roadmap hadoop 2.3.0 only include phase 1 of the heterogeneous storage; phase 2 will be included in 2.5.0; Regards, *Stanley Shi,* On Mon, Apr 14, 2014 at 4:38 PM, ascot.m...@gmail.com ascot.m...@gmail.com wrote: hi, From 2.3.0 20 February, 2014: Release 2.3.0 available Apache Hadoop 2.3.0 contains a number of significant enhancements such as: - Support for Heterogeneous Storage hierarchy in HDFS. Is it already there? Ascot On 14 Apr, 2014, at 4:34 pm, lei liu liulei...@gmail.com wrote: On April 11 hadoop-2.4 is released, the hadoop-2.4 does not include heterogeneous storages function, when does hadoop include the function? Thanks, LiuLei
download hadoop-2.4
Hadoop-2.4 is release, where can I download the hadoop-2.4 code from? Thanks, LiuLei
HDFS Client write data is slow
I use Hbase-0.94 and hadoop-2.0. I install one HDFS cluster that have 15 datanodes. If network bandwidth of two datanodes is saturation(example 100m/s), writing performance of the entire hdfs cluster is slow. I think that the slow datanodes affect the writing performance of the entire cluster. How does HDFS Client avoid writing data to the slow datanodes? Thanks, LiuLei
datanode is slow
I use Hbase0.94 and CDH4. There are 25729 tcp connections in one machine,example: hadoop@apayhbs081 ~ $ netstat -a | wc -l 25729 The linux configration is : softcore0 hardrss 1 hardnproc 20 softnproc 20 hardnproc 50 hardnproc 0 maxlogins 4 nproc 20480 nofile 204800 When there are 25729 tcp connections in one machine, the datanode is very slow. How can I resolve the question?
umount bad disk
I use HBase0.96 and CDH4.3.1. I use Short-Circuit Local Read: property namedfs.client.read.shortcircuit/name valuetrue/value/propertyproperty namedfs.domain.socket.path/name value/home/hadoop/cdh4-dn-socket/dn_socket/value/property When one disk is bad, because the RegionServer open some file on the disk, so I don't run umount, example: sudo umount -f /disk10 umount2: Device or resource busy umount: /disk10: device is busy umount2: Device or resource busy umount: /disk10: device is busy I must stop RegionServer in order to run umount command. How can don't stop RegionServer and delete the bad disk. Thanks, LiuLei
hadoop security
When I use the hadoop security, I must use jsvc to start datanode. Why must use jsvc to start datanode? What are the advantages do that? Thanks, LiuLei
hadoop security
There is DelegationToken in hadoop2. What is the role of DelegationToken and how to use the DelegationToken ? Thanks, LiuLei
Decommission DataNode
In CDH3u5, when the DataNode is Decommissioned, the DataNode progress will be shutdown by NameNode. But In CDH4.3.1, when the DataNode is Decommissioned, the DataNode progress will be not shutdown by NameNode. When the datanode is Decommissioned, why the datanode is not automatically shutdown by NameNode in CDH4.3.1? Thanks, LiuLei
ClientDatanodeProtocol.recoverBlock
In CDH3u3 there is ClientDatanodeProtocoleclipse-javadoc:%E2%98%82=hadoop-0.20.2-cdh3u5_core/src%5C/hdfs%3Corg.apache.hadoop.hdfs.protocol%7BClientDatanodeProtocol.java%E2%98%83ClientDatanodeProtocol.recoverBlock method, the method is used to recover block when data streaming is failed. But in CDH4.3.1 there is not the recoverBlock method in ClientDatanodeProtocoleclipse-javadoc:%E2%98%82=hadoop-0.20.2-cdh3u5_core/src%5C/hdfs%3Corg.apache.hadoop.hdfs.protocol%7BClientDatanodeProtocol.java%E2%98%83ClientDatanodeProtocol, and when data streaming is failed, the block is not recovered, that whether will lead to bug? Thanks, LiuLei
./bin/hdfs haadmin -transitionToActive deadlock
I use CDH4.3.1, When I start NameNode,and transition one NameNode to active, there is below deadlock: Found one Java-level deadlock: = 22558696@qtp-1616586953-6: waiting to lock monitor 0x2aaab3621f40 (object 0xf7646958, a org.apache.hadoop.hdfs.server.namenode.NameNode), which is held by IPC Server handler 1 on 20020 IPC Server handler 1 on 20020: waiting to lock monitor 0x2aaab9052ab8 (object 0xf747f1b8, a org.apache.hadoop.metrics2.impl.MetricsSystemImpl), which is held by Timer for 'NameNode' metrics system Timer for 'NameNode' metrics system: waiting for ownable synchronizer 0xf764a858, (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), which is held by IPC Server handler 1 on 20020
TestHDFSCLI error
I use CDH4.3.1 and run the TestHDFSCLI unit test,but there are below errors: 2013-10-10 13:05:39,671 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(156)) - --- 2013-10-10 13:05:39,671 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(157)) - Test ID: [1] 2013-10-10 13:05:39,671 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(158)) -Test Description: [ls: file using absolute path] 2013-10-10 13:05:39,671 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(159)) - 2013-10-10 13:05:39,671 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs hdfs://localhost.localdomain:41053 -touchz /file1] 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs hdfs://localhost.localdomain:41053 -ls /file1] 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(167)) - 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(170)) -Cleanup Commands: [-fs hdfs://localhost.localdomain:41053 -rm /file1] 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(174)) - 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(178)) - Comparator: [TokenComparator] 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(180)) - Comparision result: [pass] 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(182)) - Expected output: [Found 1 items] 2013-10-10 13:05:39,672 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(184)) - Actual output: [Found 1 items -rw-r--r-- 1 musa.ll supergroup 0 2013-10-10 13:04 /file1 ] 2013-10-10 13:05:39,673 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(178)) - Comparator: [RegexpComparator] 2013-10-10 13:05:39,673 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(180)) - Comparision result: [fail] 2013-10-10 13:05:39,673 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(182)) - Expected output: [^-rw-r--r--( )*1( )*[a-z]*( )*supergroup( )*0( )*[0-9]{4,}-[0-9]{2,}-[0-9]{2,} [0-9]{2,}:[0-9]{2,}( )*/file1] 2013-10-10 13:05:39,673 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(184)) - Actual output: [Found 1 items -rw-r--r-- 1 musa.ll supergroup 0 2013-10-10 13:04 /file1 ] How can I handle the error? Thanks, LiuLei
NullPointerException when start datanode
I use CDH-4.3.1, When I start datanode, there are below error: 2013-09-26 17:57:07,803 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 0.0.0.0:40075 2013-09-26 17:57:07,814 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false 2013-09-26 17:57:07,814 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 40075 2013-09-26 17:57:07,814 INFO org.mortbay.log: jetty-6.1.26.cloudera.2 2013-09-26 17:57:08,129 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:40075 2013-09-26 17:57:08,643 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 40020 2013-09-26 17:57:08,698 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at / 0.0.0.0:40020 2013-09-26 17:57:08,710 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: haosong-hadoop 2013-09-26 17:57:08,748 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: haosong-hadoop 2013-09-26 17:57:08,784 WARN org.apache.hadoop.hdfs.server.common.Util: Path /home/haosong.hhs/develop/soft/hadoop-2.0.0-cdh4.3.1/data should be specified as a URI in configuration files. Please update hdfs configuration. 2013-09-26 17:57:08,785 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool registering (storage id unknown) service to /10.232.98.30:7000 starting to offer service 2013-09-26 17:57:08,785 WARN org.apache.hadoop.hdfs.server.common.Util: Path /home/haosong.hhs/develop/soft/hadoop-2.0.0-cdh4.3.1/data should be specified as a URI in configuration files. Please update hdfs configuration. 2013-09-26 17:57:08,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool registering (storage id unknown) service to /10.232.98.33:7000 starting to offer service 2013-09-26 17:57:08,896 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2013-09-26 17:57:08,898 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 40020: starting 2013-09-26 17:57:09,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-78625276-10.232.98.30-1376034343336:blk_-2874307426466435275_16431304 src: /10.232.98.33:42654 dest: /10.232.98.33:40010 2013-09-26 17:57:09,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-78625276-10.232.98.30-1376034343336:blk_5332252262254683952_16431307 src: /10.232.98.30:47301 dest: /10.232.98.33:40010 2013-09-26 17:57:09,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-78625276-10.232.98.30-1376034343336:blk_-7540820026406349432_16431305 src: /10.232.98.30:47300 dest: /10.232.98.33:40010 2013-09-26 17:57:09,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-78625276-10.232.98.30-1376034343336:blk_-5489298128750533734_16431306 src: /10.232.98.33:42655 dest: /10.232.98.33:40010 2013-09-26 17:57:09,247 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /disk6/haosong-cdh4/data/in_use.lock acquired by nodename 24...@dw33.kgb.sqa.cm4 2013-09-26 17:57:09,271 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dw33.kgb.sqa.cm4:40010:DataXceiver error processing WRITE_BLOCK operation src: /10.232.98.33:42655 dest: /10.232.98.33:40010 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:159) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:452) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:222) at java.lang.Thread.run(Thread.java:662) 2013-09-26 17:57:09,271 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dw33.kgb.sqa.cm4:40010:DataXceiver error processing WRITE_BLOCK operation src: /10.232.98.33:42654 dest: /10.232.98.33:40010 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:159) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:452) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:222) at java.lang.Thread.run(Thread.java:662) 2013-09-26 17:57:09,271 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dw33.kgb.sqa.cm4:40010:DataXceiver error processing WRITE_BLOCK operation src: /10.232.98.30:47300 dest: /10.232.98.33:40010 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:159) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:452) at
IncompatibleClassChangeError
I use the CDH-4.3.1 and mr1, when I run one job, I am getting the following error. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:152) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Nativ e Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at com.taobao.hbase.test.RandomKVGenerater.main(RandomKVGenerater.java:248) How can I handle the error? Thanks, LiuLei
Re: IncompatibleClassChangeError
Yes, My job is compiled in CHD3u3, and I run the job on CDH4.3.1, but I use the mr1 of CHD4.3.1 to run the job. What are the different mr1 of cdh4 and mr of cdh3? Thanks, LiuLei 2013/9/30 Pradeep Gollakota pradeep...@gmail.com I believe it's a difference between the version that your code was compiled against vs the version that you're running against. Make sure that you're not packaging hadoop jar's into your jar and make sure you're compiling against the correct version as well. On Sun, Sep 29, 2013 at 7:27 PM, lei liu liulei...@gmail.com wrote: I use the CDH-4.3.1 and mr1, when I run one job, I am getting the following error. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:152) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Nativ e Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at com.taobao.hbase.test.RandomKVGenerater.main(RandomKVGenerater.java:248) How can I handle the error? Thanks, LiuLei
IncompatibleClassChangeError
I use the CDH-4.3.1 and mr1, when I run one job, I am getting the following error. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:152) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Nativ e Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at com.taobao.hbase.test.RandomKVGenerater.main(RandomKVGenerater.java:248)
Re: metric type
Hello, Can anybody answer the question? 2013/9/1 lei liu liulei...@gmail.com Hi Jitendr, thanks for your reply. If MutableCounterLong is uesed to IO/sec statistics, I think the value of MutableCounterLong should be divided by 10 and be reseted to zero per ten seconds in MutableCounterLong.snapshot method, is that right? But MutableCounterLong.snapshot method don't do that. I missed anything please tell me. Looking forward to your reply. Thanks, LiuLei 2013/9/1 Jitendra Yadav jeetuyadav200...@gmail.com Yes, MutableCounterLong helps to gather DataNode read/write statics. There is more option available within this metric Regards Jitendra On 8/31/13, lei liu liulei...@gmail.com wrote: There is @Metric MutableCounterLong bytesWritten attribute in DataNodeMetrics, which is used to IO/sec statistics? 2013/8/31 Jitendra Yadav jeetuyadav200...@gmail.com Hi, For IO/sec statistics I think MutableCounterLongRate and MutableCounterLong more useful than others and for xceiver thread number I'm not bit sure right now. Thanks Jiitendra On Fri, Aug 30, 2013 at 1:40 PM, lei liu liulei...@gmail.com wrote: Hi Jitendra, If I want to statistics number of bytes read per second,and display the result into ganglia, should I use MutableCounterLong or MutableGaugeLong? If I want to display current xceiver thread number in datanode into ganglia, should I use MutableCounterLong or MutableGaugeLong? Thanks, LiuLei 2013/8/30 Jitendra Yadav jeetuyadav200...@gmail.com Hi, Below link contains the answer for your question. http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/metrics2/package-summary.html Regards Jitendra On Fri, Aug 30, 2013 at 11:35 AM, lei liu liulei...@gmail.com wrote: I use the metrics v2, there are COUNTER and GAUGE metric type in metrics v2. What is the difference between the two? Thanks, LiuLei
Re: metric type
Hi Jitendr, thanks for your reply. If MutableCounterLong is uesed to IO/sec statistics, I think the value of MutableCounterLong should be divided by 10 and be reseted to zero per ten seconds in MutableCounterLong.snapshot method, is that right? But MutableCounterLong.snapshot method don't do that. I missed anything please tell me. Looking forward to your reply. Thanks, LiuLei 2013/9/1 Jitendra Yadav jeetuyadav200...@gmail.com Yes, MutableCounterLong helps to gather DataNode read/write statics. There is more option available within this metric Regards Jitendra On 8/31/13, lei liu liulei...@gmail.com wrote: There is @Metric MutableCounterLong bytesWritten attribute in DataNodeMetrics, which is used to IO/sec statistics? 2013/8/31 Jitendra Yadav jeetuyadav200...@gmail.com Hi, For IO/sec statistics I think MutableCounterLongRate and MutableCounterLong more useful than others and for xceiver thread number I'm not bit sure right now. Thanks Jiitendra On Fri, Aug 30, 2013 at 1:40 PM, lei liu liulei...@gmail.com wrote: Hi Jitendra, If I want to statistics number of bytes read per second,and display the result into ganglia, should I use MutableCounterLong or MutableGaugeLong? If I want to display current xceiver thread number in datanode into ganglia, should I use MutableCounterLong or MutableGaugeLong? Thanks, LiuLei 2013/8/30 Jitendra Yadav jeetuyadav200...@gmail.com Hi, Below link contains the answer for your question. http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/metrics2/package-summary.html Regards Jitendra On Fri, Aug 30, 2013 at 11:35 AM, lei liu liulei...@gmail.com wrote: I use the metrics v2, there are COUNTER and GAUGE metric type in metrics v2. What is the difference between the two? Thanks, LiuLei
Re: metric type
There is @Metric MutableCounterLong bytesWritten attribute in DataNodeMetrics, which is used to IO/sec statistics? 2013/8/31 Jitendra Yadav jeetuyadav200...@gmail.com Hi, For IO/sec statistics I think MutableCounterLongRate and MutableCounterLong more useful than others and for xceiver thread number I'm not bit sure right now. Thanks Jiitendra On Fri, Aug 30, 2013 at 1:40 PM, lei liu liulei...@gmail.com wrote: Hi Jitendra, If I want to statistics number of bytes read per second,and display the result into ganglia, should I use MutableCounterLong or MutableGaugeLong? If I want to display current xceiver thread number in datanode into ganglia, should I use MutableCounterLong or MutableGaugeLong? Thanks, LiuLei 2013/8/30 Jitendra Yadav jeetuyadav200...@gmail.com Hi, Below link contains the answer for your question. http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/metrics2/package-summary.html Regards Jitendra On Fri, Aug 30, 2013 at 11:35 AM, lei liu liulei...@gmail.com wrote: I use the metrics v2, there are COUNTER and GAUGE metric type in metrics v2. What is the difference between the two? Thanks, LiuLei
namenode name dir
I use QJM, do I need to config two directories for the dfs.namenode.name.dir, one local filesystem path and one NFS path? I think the Stadnby NameNode also store the fsimage, so I think I only need to config one local file system path. Thanks, LiuLei
metric type
I use the metrics v2, there are COUNTER and GAUGE metric type in metrics v2. What is the difference between the two? Thanks, LiuLei
Re: metric type
Hi Jitendra, If I want to statistics number of bytes read per second,and display the result into ganglia, should I use MutableCounterLong or MutableGaugeLong? If I want to display current xceiver thread number in datanode into ganglia, should I use MutableCounterLong or MutableGaugeLong? Thanks, LiuLei 2013/8/30 Jitendra Yadav jeetuyadav200...@gmail.com Hi, Below link contains the answer for your question. http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/metrics2/package-summary.html Regards Jitendra On Fri, Aug 30, 2013 at 11:35 AM, lei liu liulei...@gmail.com wrote: I use the metrics v2, there are COUNTER and GAUGE metric type in metrics v2. What is the difference between the two? Thanks, LiuLei
domain socket
There are dfs.client.read.shortcircuit and dfs.client.domain.socket.data.traffic configuration in domain socket. What is different them? Thanks, LiuLei
hadoop2 and Hbase0.94
I use hadoop2 and hbase0.94, but there is below exception: 2013-08-28 11:36:12,922 ERROR [MASTER_TABLE_OPERATIONS-dw74.kgb.sqa.cm4,13646,1377660964832-0] executor.EventHandler(172): Caught throwable while processing event C_M_DELETE_TABLE java.lang.IllegalArgumentException: Wrong FS: file:/tmp/ hbase-shenxiu.cx/hbase/observed_table/47b334989065a8ac84873e6d07c1de62, expected: hdfs://localhost.lo caldomain:35974 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:172) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:402) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1427) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1467) at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1052) at org.apache.hadoop.hbase.backup.HFileArchiver.archiveRegion(HFileArchiver.java:123) at org.apache.hadoop.hbase.backup.HFileArchiver.archiveRegion(HFileArchiver.java:72) at org.apache.hadoop.hbase.master.MasterFileSystem.deleteRegion(MasterFileSystem.java:444) at org.apache.hadoop.hbase.master.handler.DeleteTableHandler.handleTableOperation(DeleteTableHandler.java:73) at org.apache.hadoop.hbase.master.handler.TableEventHandler.process(TableEventHandler.java:96) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-08-28 11:37:05,653 INFO [Master:0;dw74.kgb.sqa.cm4,13646,1377660964832.archivedHFileCleaner] util.FSUtils(1055): hdfs://localhost.localdomain:35974/use
Re: hadoop2 and Hbase0.94
When I run hbase unit test, there is the exception. 2013/8/28 Harsh J ha...@cloudera.com Moving to u...@hbase.apache.org. Please share your hbase-site.xml and core-site.xml. Was this HBase cluster previously running on a standalone local filesystem mode? On Wed, Aug 28, 2013 at 2:06 PM, lei liu liulei...@gmail.com wrote: I use hadoop2 and hbase0.94, but there is below exception: 2013-08-28 11:36:12,922 ERROR [MASTER_TABLE_OPERATIONS-dw74.kgb.sqa.cm4,13646,1377660964832-0] executor.EventHandler(172): Caught throwable while processing event C_M_DELETE_TABLE java.lang.IllegalArgumentException: Wrong FS: file:/tmp/ hbase-shenxiu.cx/hbase/observed_table/47b334989065a8ac84873e6d07c1de62, expected: hdfs://localhost.lo caldomain:35974 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:172) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:402) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1427) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1467) at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1052) at org.apache.hadoop.hbase.backup.HFileArchiver.archiveRegion(HFileArchiver.java:123) at org.apache.hadoop.hbase.backup.HFileArchiver.archiveRegion(HFileArchiver.java:72) at org.apache.hadoop.hbase.master.MasterFileSystem.deleteRegion(MasterFileSystem.java:444) at org.apache.hadoop.hbase.master.handler.DeleteTableHandler.handleTableOperation(DeleteTableHandler.java:73) at org.apache.hadoop.hbase.master.handler.TableEventHandler.process(TableEventHandler.java:96) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-08-28 11:37:05,653 INFO [Master:0;dw74.kgb.sqa.cm4,13646,1377660964832.archivedHFileCleaner] util.FSUtils(1055): hdfs://localhost.localdomain:35974/use -- Harsh J
Re: hadoop2 and Hbase0.94
In org.apache.hadoop.hbase.coprocessor.TestMasterObserver unit test. 2013/8/28 lei liu liulei...@gmail.com When I run hbase unit test, there is the exception. 2013/8/28 Harsh J ha...@cloudera.com Moving to u...@hbase.apache.org. Please share your hbase-site.xml and core-site.xml. Was this HBase cluster previously running on a standalone local filesystem mode? On Wed, Aug 28, 2013 at 2:06 PM, lei liu liulei...@gmail.com wrote: I use hadoop2 and hbase0.94, but there is below exception: 2013-08-28 11:36:12,922 ERROR [MASTER_TABLE_OPERATIONS-dw74.kgb.sqa.cm4,13646,1377660964832-0] executor.EventHandler(172): Caught throwable while processing event C_M_DELETE_TABLE java.lang.IllegalArgumentException: Wrong FS: file:/tmp/ hbase-shenxiu.cx/hbase/observed_table/47b334989065a8ac84873e6d07c1de62, expected: hdfs://localhost.lo caldomain:35974 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:172) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:402) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1427) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1467) at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1052) at org.apache.hadoop.hbase.backup.HFileArchiver.archiveRegion(HFileArchiver.java:123) at org.apache.hadoop.hbase.backup.HFileArchiver.archiveRegion(HFileArchiver.java:72) at org.apache.hadoop.hbase.master.MasterFileSystem.deleteRegion(MasterFileSystem.java:444) at org.apache.hadoop.hbase.master.handler.DeleteTableHandler.handleTableOperation(DeleteTableHandler.java:73) at org.apache.hadoop.hbase.master.handler.TableEventHandler.process(TableEventHandler.java:96) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-08-28 11:37:05,653 INFO [Master:0;dw74.kgb.sqa.cm4,13646,1377660964832.archivedHFileCleaner] util.FSUtils(1055): hdfs://localhost.localdomain:35974/use -- Harsh J
Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
Hi Jitendra, I don't use the compression parameter. My network card is 100M/s, and I set the dfs.image.transfer.bandwidthPerSec. to 50M, so I think Active NameNode still has 50M bandwidth to be used to handle RPC request, why the OPS dropped by 50%? 2013/8/15 Jitendra Yadav jeetuyadav200...@gmail.com Hi, Looks like you got some pace, did you also tried with compression parameter? I think you will get more optimization with it. Also file transfer speed depends on our network bandwidth between PNN/SNN and network traffic b/w nodes.What's your network conf? Thanks On Wed, Aug 14, 2013 at 11:39 AM, lei liu liulei...@gmail.com wrote: I set the dfs.image.transfer.bandwidthPerSec. to 50M, and the performance is below: 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440 speed: 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338 speed:1044 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526 speed:1109 *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324 speed:539* *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232 speed:495 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006 speed:438 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952 speed:447* *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256 speed:665* 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378 speed:1056 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338 speed:1048 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400 speed:1053 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252 speed:1042 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286 speed:1051 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956 speed:1033 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804 speed:1042 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558 speed:1137 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980 speed:1071 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642 speed:1133 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806 speed:1108 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940 speed:1056 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430 speed:1074 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940 speed:1125 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094 speed:1107 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646 speed:1077 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230 speed:1029 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132 speed:1095 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778 speed:1132 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956 speed:1108 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556 speed:980 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512 speed:1097 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240 speed:1086 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226 speed:1099 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706 speed:1024 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906 speed:960 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172 speed:1013 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130 speed:1097 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848 speed:1085 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834 speed:1049 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616 speed:839 *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896 speed:464 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572 speed:483 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512 speed:447* *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604 speed:504* 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792 speed:809 The there are below info in Active NameNode: 2013-08-14 12:44:47,301 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1txid=655178418storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600 2013-08-14 12:48:57,529 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took 250.23s at 10280.59 KB/s* 2013-08-14 12:48:57,530 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file fsimage.ckpt_00655178418 size
dynamic configuration
There is ReconfigurationServlet class in hadoop-2.0.5. How I to use the function for NameNode and DataNode? Thanks, LiuLei
Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
The fsimage file size is 1658934155 2013/8/13 Harsh J ha...@cloudera.com How large are your checkpointed fsimage files? On Mon, Aug 12, 2013 at 3:42 PM, lei liu liulei...@gmail.com wrote: When Standby Namenode is doing checkpoint, upload the image file to Active NameNode, the Active NameNode is very slow. What is reason result to the Active NameNode is slow? Thanks, LiuLei -- Harsh J
Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
my.EditLogPerformance (EditLogPerformance.java:run(37)) - totalCount:11087546 speed:6 2013-08-13 17:49:51,599 INFO my.EditLogPerformance (EditLogPerformance.java:run(37)) - totalCount:11087716 speed:8 2013-08-13 17:50:01,602 INFO my.EditLogPerformance (EditLogPerformance.java:run(37)) - totalCount:11091608 speed:194 The speed is less than ten sometimes. I find when Active NameNode download the fsimage file, the speed is less than ten. So I think download fsimage file that affects the performance of Active NameNode. There are below info in Standby NameNode: 2013-08-13 17:48:12,412 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering checkpoint because there have been 2558038 txns since the last checkpoint, which exceeds the configured threshold 100 2013-08-13 17:48:12,413 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_00521186406 using no compression 2013-08-13 17:49:19,085 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 3385425100 saved in 66 seconds. 2013-08-13 17:49:19,655 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://10.232.98.77:20021/getimage?putimage=1txid=521186406port=20021storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600 2013-08-13 17:53:21,107 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 241.45s at 0.00 KB/s 2013-08-13 17:53:21,107 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 521186406 to namenode at 10.232.98.77:20021 There are below info in Active NameNode: 2013-08-13 17:49:19,659 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1txid=521186406storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600 2013-08-13 17:53:20,610 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 240.95s at 13720.96 KB/s 2013-08-13 17:53:20,610 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file fsimage.ckpt_00521186406 size 3385425100 bytes. 2013/8/13 Jitendra Yadav jeetuyadav200...@gmail.com Hi, Can you please let me know that how you identified the slowness between primary and standby namnode? Also please share the network connection bandwidth between these two servers. Thanks On Tue, Aug 13, 2013 at 11:52 AM, lei liu liulei...@gmail.com wrote: The fsimage file size is 1658934155 2013/8/13 Harsh J ha...@cloudera.com How large are your checkpointed fsimage files? On Mon, Aug 12, 2013 at 3:42 PM, lei liu liulei...@gmail.com wrote: When Standby Namenode is doing checkpoint, upload the image file to Active NameNode, the Active NameNode is very slow. What is reason result to the Active NameNode is slow? Thanks, LiuLei -- Harsh J EditLogPerformance.java Description: Binary data
when Standby Namenode is doing checkpoint, the Active NameNode is slow.
When Standby Namenode is doing checkpoint, upload the image file to Active NameNode, the Active NameNode is very slow. What is reason result to the Active NameNode is slow? Thanks, LiuLei
Re: MutableCounterLong metrics display in ganglia
Thanks Harsh for your reply. What are difference MutableCounterLong and MutableGaugeLong class ? I find the MutableCounterLong is used to calculate throughput, the value be reset per ten seconds, and MutableGaugeLong is up-count and no reset. I am newer for hadoop-2.0.5, please tell me if there is an error. Thanks, LiuLei 2013/8/9 Harsh J ha...@cloudera.com The counter, being num-ops, should up-count and not reset. Note that your test may be at fault though - calling hsync may not always call NN#fsync(…) unless you are passing the proper flags to make it always do so. On Wed, Aug 7, 2013 at 4:27 PM, lei liu liulei...@gmail.com wrote: I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org. apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, when FSDataOutputStream.hsync() method is called, the value of fsyncCount is increased, dataNode send the value of fsyncCount to ganglia every ten seconds, so I think the value of fsyncCount in ganglia should be 10, 20 ,30, 40 and so on . but the ganglia display 1,1,1,1,1 .. , so the value is the value of fsyncCount is set to zero every ten seconds and ”fsyncCount.value/10“ . Is the the value of MutableCounterLong class set to zero every ten seconds and MutableCounterLong .value/10? Thanks, LiuLei -- Harsh J
MutableCounterLong and MutableCounterLong class difference in metrics v2
I use hadoop-2.0.5, there are MutableCounterLong and MutableCounterLong class in metrics v2. I am studing metrics v2 code. What are difference MutableCounterLong and MutableCounterLong class ? I find the MutableCounterLong is used to calculate throughput, is that right? How does the metrics v2 to handle MutableCounterLong class? Thanks, LiuLei
MutableCounterLong metrics display in ganglia
I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org. apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, when FSDataOutputStream.hsync() method is called, the value of fsyncCount is increased, dataNode send the value of fsyncCount to ganglia every ten seconds, so I think the value of fsyncCount in ganglia should be 10, 20 ,30, 40 and so on . but the ganglia display 1,1,1,1,1 .. , so the value is the value of fsyncCount is set to zero every ten seconds and ”fsyncCount.value/10“ . Is the the value of MutableCounterLong class set to zero every ten seconds and MutableCounterLong .value/10? Thanks, LiuLei
throughput metrics in hadoop-2.0.5
I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStreameclipse-javadoc:%E2%98%82=hadoop-hdfs/C:%5C/Users%5C/musa.ll%5C/.m2%5C/repository%5C/org%5C/apache%5C/hadoop%5C/hadoop-common%5C/2.0.0-cdh4.3.0%5C/hadoop-common-2.0.0-cdh4.3.0.jar%3Corg.apache.hadoop.fs%28FSDataOutputStream.class%E2%98%83FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value in ganglia is below: [image: dw62.kgb.sqa.cm4 dfs.datanode.FsyncCount] I want to know the ganglia how to display the value of MutableCounterLong class? Thanks, LiuLei
MutableRate metrics in hadoop-2.0.5
There is code in MutableRate class: public synchronized void snapshot(MetricsRecordBuilder builder, boolean all) { if (all || changed()) { numSamples += intervalStat.numSamples(); builder.addCounter(numInfo, numSamples) .addGauge(avgInfo, lastStat().mean()); *if (extended)* { builder.addGauge(stdevInfo, lastStat().stddev()) .addGauge(iMinInfo, lastStat().min()) .addGauge(iMaxInfo, lastStat().max()) .addGauge(minInfo, minMax.min()) .addGauge(maxInfo, minMax.max()); } if (changed()) { if (numSamples 0) { intervalStat.copyTo(prevStat); intervalStat.reset(); } clearChanged(); } } } How can I set the extended variable to true? Thanks, LiuLei
Re: throughput metrics in hadoop-2.0.5
There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value the value is fsyncCount.value/10, that is in 1 ,1 , 1 , 1 in ganglia. How does ganglia to display the value of MutableCounterLong class? Is that fsyncCount.value or fsyncCount.value/10? 2013/8/6 lei liu liulei...@gmail.com I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value in ganglia is below: [image: dw62.kgb.sqa.cm4 dfs.datanode.FsyncCount] I want to know the ganglia how to display the value of MutableCounterLong class? Thanks, LiuLei
Re: throughput metrics in hadoop-2.0.5
Is the the value of MutableCounterLong class set to zreo per 10 seconds? 2013/8/6 lei liu liulei...@gmail.com There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value the value is fsyncCount.value/10, that is in 1 ,1 , 1 , 1 in ganglia. How does ganglia to display the value of MutableCounterLong class? Is that fsyncCount.value or fsyncCount.value/10? 2013/8/6 lei liu liulei...@gmail.com I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value in ganglia is below: [image: dw62.kgb.sqa.cm4 dfs.datanode.FsyncCount] I want to know the ganglia how to display the value of MutableCounterLong class? Thanks, LiuLei
Re: throughput metrics in hadoop-2.0.5
Is the the value of MutableCounterLong class set to zero per 10 seconds? 2013/8/6 lei liu liulei...@gmail.com Is the the value of MutableCounterLong class set to zreo per 10 seconds? 2013/8/6 lei liu liulei...@gmail.com There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value the value is fsyncCount.value/10, that is in 1 ,1 , 1 , 1 in ganglia. How does ganglia to display the value of MutableCounterLong class? Is that fsyncCount.value or fsyncCount.value/10? 2013/8/6 lei liu liulei...@gmail.com I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, the MutableCounterLong class continuously increase the value, so I think the value in ganglia should be 10, 20 ,30, 40 and so on. but the value in ganglia is below: [image: dw62.kgb.sqa.cm4 dfs.datanode.FsyncCount] I want to know the ganglia how to display the value of MutableCounterLong class? Thanks, LiuLei
Re: metics v1 in hadoop-2.0.5
There is hadoop-metrics.properties file in etc/hadoop directory. I config the file with below content: dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 dfs.period=10 dfs.servers=dw74:8649 But the configuration does not work. Do I only use metrics v2 in hadoop-2.0.5? 2013/8/5 lei liu liulei...@gmail.com Can I use metrics v1 in hadoop-2.0.5? Thanks, LiuLei
metics v1 in hadoop-2.0.5
Can I use metrics v1 in hadoop-2.0.5? Thanks, LiuLei
Standby NameNode checkpoint exception
I use hadoop-2.0.5, and QJM for HA. When Standby NameNode do checkpoint,there are below exception in Standby NameNode: 2013-08-01 13:43:07,965 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering checkpoint because there have been 763426 txns since the last checkpoint, wh ich exceeds the configured threshold 4 2013-08-01 13:43:07,966 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_00048708235 usi ng no compression 2013-08-01 13:43:37,405 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1504089705 saved in 29 seconds. 2013-08-01 13:43:37,410 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid = 47944809 2013-08-01 13:43:37,410 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/home/musa.ll/hadoop2/cluster-data/name/current/f simage_00047222679, cpktTxId=00047222679) 2013-08-01 13:43:37,723 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [10.232.98.61:20022, 10.232.98.62:20022, 10.232.98.63: 20022, 10.232.98.64:20022, 10.232.98.65:20022]. Skipping. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 3/5. 4 exceptions thrown: 10.232.98.62:20022: Asked for firstTxId 46944810 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_00046630461-00047222679 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) hadoop-musa.ll-namenode-dw78.kgb.sqa.cm4.log 350842L, 60353971C 348726,1 99% 2013-08-01 14:28:07,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 26.08s at 0.00 KB/s 2013-08-01 14:28:07,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 60835762 to namenode at 10.232.98.77:20021 2013-08-01 14:29:05,203 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode /10.232.98.77:20020 2013-08-01 14:29:06,242 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 137678/567332 transactions completed. (24%) 2013-08-01 14:29:07,243 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 275618/567332 transactions completed. (49%) 2013-08-01 14:29:08,244 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 407627/567332 transactions completed. (72%) 2013-08-01 14:29:09,245 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 545153/567332 transactions completed. (96%) 2013-08-01 14:29:20,146 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 567332 edits starting from txid 60835762 2013-08-01 14:30:44,411 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1950604672 saved in 37 seconds. 2013-08-01 14:30:44,416 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid = 60835762 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 3/5. 4 exceptions thrown: 10.232.98.62:20022: Asked for firstTxId 59835763 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_00059678382-00060264590 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at
Re: ./hdfs namenode -bootstrapStandby error
Hi, Azuryy Running the 'hdfs namenode -initializeSharedEdits' command on the active NN , I must to stop the Active NameNode. I think when excuting the ./hdfs namenode -bootstrapStandby command on Standby NameNode, the Active NameNode and JournalNameNodes should be alive, otherwise there is no HA. I was a beginner for QJM , there is something wrong, please correct me. Thanks, LiuLei 2013/7/19 Azuryy Yu azury...@gmail.com hi, can you using 'hdfs namenode -initializeSharedEdits' on the active NN, remember start all journal nodes before try this. On Jul 19, 2013 5:17 PM, lei liu liulei...@gmail.com wrote: I use hadoop-2.0.5 version and use QJM for HA. I use ./hdfs namenode -bootstrapStandby for StandbyNameNode, but report below error: = About to bootstrap Standby ID nn2 from: Nameservice ID: mycluster Other Namenode ID: nn1 Other NN's HTTP address: 10.232.98.77:20021 Other NN's IPC address: dw77.kgb.sqa.cm4/10.232.98.77:20020 Namespace ID: 1499625118 Block pool ID: BP-2012507965-10.232.98.77-1372993302021 Cluster ID: CID-921af0aa-b831-4828-965c-3b71a5149600 Layout version: -40 = Re-format filesystem in Storage Directory /home/musa.ll/hadoop2/cluster-data/name ? (Y or N) Y 13/07/19 17:04:28 INFO common.Storage: Storage directory /home/musa.ll/hadoop2/cluster-data/name has been successfully formatted. 13/07/19 17:04:29 FATAL ha.BootstrapStandby: Unable to read transaction ids 16317-16337 from the configured shared edits storage qjournal:// 10.232.98.61:20022;10.232.98.62:20022;10.232.98.63:20022/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 16337 but unable to find any edit logs containing txid 16331 13/07/19 17:04:29 INFO util.ExitUtil: Exiting with status 6 The edit logs are below content in JournalNode: -rw-r--r-- 1 musa.ll users 30 Jul 19 15:51 edits_0016327-0016328 -rw-r--r-- 1 musa.ll users 30 Jul 19 15:53 edits_0016329-0016330 -rw-r--r-- 1 musa.ll users 1048576 Jul 19 17:03 edits_inprogress_0016331 The edits_inprogress_0016331 should contains the 16331-16337 transactions, why the ./hdfs namenode -bootstrapStandby command report error? How can I initialize the StandbyNameNode? Thanks, LiuLei
./hdfs namenode -bootstrapStandby error
I use hadoop-2.0.5 version and use QJM for HA. I use ./hdfs namenode -bootstrapStandby for StandbyNameNode, but report below error: = About to bootstrap Standby ID nn2 from: Nameservice ID: mycluster Other Namenode ID: nn1 Other NN's HTTP address: 10.232.98.77:20021 Other NN's IPC address: dw77.kgb.sqa.cm4/10.232.98.77:20020 Namespace ID: 1499625118 Block pool ID: BP-2012507965-10.232.98.77-1372993302021 Cluster ID: CID-921af0aa-b831-4828-965c-3b71a5149600 Layout version: -40 = Re-format filesystem in Storage Directory /home/musa.ll/hadoop2/cluster-data/name ? (Y or N) Y 13/07/19 17:04:28 INFO common.Storage: Storage directory /home/musa.ll/hadoop2/cluster-data/name has been successfully formatted. 13/07/19 17:04:29 FATAL ha.BootstrapStandby: Unable to read transaction ids 16317-16337 from the configured shared edits storage qjournal://10.232.98.61:20022;10.232.98.62:20022; 10.232.98.63:20022/mycluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 16337 but unable to find any edit logs containing txid 16331 13/07/19 17:04:29 INFO util.ExitUtil: Exiting with status 6 The edit logs are below content in JournalNode: -rw-r--r-- 1 musa.ll users 30 Jul 19 15:51 edits_0016327-0016328 -rw-r--r-- 1 musa.ll users 30 Jul 19 15:53 edits_0016329-0016330 -rw-r--r-- 1 musa.ll users 1048576 Jul 19 17:03 edits_inprogress_0016331 The edits_inprogress_0016331 should contains the 16331-16337 transactions, why the ./hdfs namenode -bootstrapStandby command report error? How can I initialize the StandbyNameNode? Thanks, LiuLei
QJM and dfs.namenode.edits.dir
When I use QJM for HA, do I need to save edit log on the local filesystem? I think the QJM is high availability for edit log, so I don't need to configuration the dfs.namenode.edits.dir. Thanks, LiuLei
QJM for federation
I have two namespaces, example below: property namedfs.nameservices/name valuens1,ns2/value /property Can I config the dfs.namenode.shared.edits.dir to below content? property namedfs.namenode.shared.edits.dir/name valueqjournal://10.232.98.61:20022;10.232.98.62:20022;10.232.98.63:20022/nn1,nn2/value /property Thanks, LiuLei
Re: QJM for federation
Thanks Harsh. 2013/7/17 Harsh J ha...@cloudera.com This has been asked previously. Use suffixes to solve your issue. See http://search-hadoop.com/m/Fingkg6Dk91 On Wed, Jul 17, 2013 at 1:33 PM, lei liu liulei...@gmail.com wrote: I have two namespaces, example below: property namedfs.nameservices/name valuens1,ns2/value /property Can I config the dfs.namenode.shared.edits.dir to below content? property namedfs.namenode.shared.edits.dir/name valueqjournal://10.232.98.61:20022;10.232.98.62:20022; 10.232.98.63:20022/nn1,nn2/value /property Thanks, LiuLei -- Harsh J
Re: QJM for federation
I have another question for QJM. If I use QJM for HA, do I need to save edit log on the local filesystem? I think the QJM is high availability for edit log, so I don't need to config the dfs.namenode.edits.dir. Thanks, LiuLei 2013/7/17 lei liu liulei...@gmail.com Thanks Harsh. 2013/7/17 Harsh J ha...@cloudera.com This has been asked previously. Use suffixes to solve your issue. See http://search-hadoop.com/m/Fingkg6Dk91 On Wed, Jul 17, 2013 at 1:33 PM, lei liu liulei...@gmail.com wrote: I have two namespaces, example below: property namedfs.nameservices/name valuens1,ns2/value /property Can I config the dfs.namenode.shared.edits.dir to below content? property namedfs.namenode.shared.edits.dir/name valueqjournal://10.232.98.61:20022;10.232.98.62:20022; 10.232.98.63:20022/nn1,nn2/value /property Thanks, LiuLei -- Harsh J
block over-replicated
I use hadoop-2.0.3. I find when on block is over-replicated, the replicas to be add to excessReplicateMap attribute of Blockmanager. But when the block is deleted or the block has the intended number of replicas, the replicas is not deleted form excessReplicateMap attribute. I think this is bug. If my understand may be wrong, please anyboy tell me when to delete replicas form excessReplicateMap attribute?
Re: DFSOutputStream.sync() method latency time
The sync method include below code: // Flush only if we haven't already flushed till this offset. if (lastFlushOffset != bytesCurBlock) { assert bytesCurBlock lastFlushOffset; // record the valid offset of this flush lastFlushOffset = bytesCurBlock; enqueueCurrentPacket(); } When there are 64k data in memory, the write method call enqueueCurrentPacket method send one package to pipeline. But when the data in memory are less than 64K, the write method don't call enqueueCurrentPacket method, so the write method don't send data to pipeline, and then client call sync method, the sync method call enqueueCurrentPacket method send data to pipeline, and wait ack info. 2013/3/29 Yanbo Liang yanboha...@gmail.com The write method write data to memory of client, the sync method send package to pipeline I thin you made a mistake for understanding the write procedure of HDFS. It's right that the write method write data to memory of client, however the data in the client memory is sent to DataNodes at the time when it was filled to the client memory. This procedure is finished by another thread, so it's concurrent operation. sync method has the same operation except for it is used for the last packet in the stream. It waits until have received ack from DataNodes. The write method and sync method is not concurrent. The write method or sync method is concurrent with the backend thread which is used to transfer data to DataNodes. And I guess you can understand Chinese, so I recommend you to read one of my blog(http://yanbohappy.sinaapp.com/?p=143) and it explain the write workflow detail. 2013/3/29 lei liu liulei...@gmail.com Thanks Yanbo for your reply. I test code are : FSDataOutputStream outputStream = fs.create(path); Random r = new Random(); long totalBytes = 0; String str = new String(new byte[1024]); while(totalBytes 1024 * 1024 * 500) { byte[] bytes = (start_+r.nextLong() +_ + str + r.nextLong()+_end + \n).getBytes(); outputStream.write(bytes); outputStream.sync(); totalBytes = totalBytes + bytes.length; } outputStream.close(); The write method and sync method is synchronized, so the two method is not cocurrent. The write method write data to memory of client, the sync method send package to pipelien, client can execute write method until the sync method return sucess, so I think the sync method latency time should be equal with superposition of each datanode operation. 2013/3/28 Yanbo Liang yanboha...@gmail.com 1st when client wants to write data to HDFS, it should be create DFSOutputStream. Then the client write data to this output stream and this stream will transfer data to all DataNodes with the constructed pipeline by the means of Packet whose size is 64KB. These two operations is concurrent, so the write latency is not simple superposition. 2nd the sync method only flush the last packet ( at most 64KB ) data to the pipeline. Because of the cocurrent processing of all these operations, so the latency is smaller than the superposition of each operation. It's parallel computing rather than serial computing in a sense. 2013/3/28 lei liu liulei...@gmail.com When client write data, if there are three replicates, the sync method latency time formula should be: sync method latency time = first datanode receive data time + sencond datanode receive data time + third datanode receive data time. if the three datanode receive data time all are 2 millisecond, so the sync method latency time should is 6 millisecond, but according to our our monitor, the the sync method latency time is 2 millisecond. How to calculate sync method latency time? Thanks, LiuLei
DFSOutputStream.sync() method latency time
When client write data, if there are three replicates, the sync method latency time formula should be: sync method latency time = first datanode receive data time + sencond datanode receive data time + third datanode receive data time. if the three datanode receive data time all are 2 millisecond, so the sync method latency time should is 6 millisecond, but according to our our monitor, the the sync method latency time is 2 millisecond. How to calculate sync method latency time? Thanks, LiuLei
Re: DFSOutputStream.sync() method latency time
Thanks Yanbo for your reply. I test code are : FSDataOutputStream outputStream = fs.create(path); Random r = new Random(); long totalBytes = 0; String str = new String(new byte[1024]); while(totalBytes 1024 * 1024 * 500) { byte[] bytes = (start_+r.nextLong() +_ + str + r.nextLong()+_end + \n).getBytes(); outputStream.write(bytes); outputStream.sync(); totalBytes = totalBytes + bytes.length; } outputStream.close(); The write method and sync method is synchronized, so the two method is not cocurrent. The write method write data to memory of client, the sync method send package to pipelien, client can execute write method until the sync method return sucess, so I think the sync method latency time should be equal with superposition of each datanode operation. 2013/3/28 Yanbo Liang yanboha...@gmail.com 1st when client wants to write data to HDFS, it should be create DFSOutputStream. Then the client write data to this output stream and this stream will transfer data to all DataNodes with the constructed pipeline by the means of Packet whose size is 64KB. These two operations is concurrent, so the write latency is not simple superposition. 2nd the sync method only flush the last packet ( at most 64KB ) data to the pipeline. Because of the cocurrent processing of all these operations, so the latency is smaller than the superposition of each operation. It's parallel computing rather than serial computing in a sense. 2013/3/28 lei liu liulei...@gmail.com When client write data, if there are three replicates, the sync method latency time formula should be: sync method latency time = first datanode receive data time + sencond datanode receive data time + third datanode receive data time. if the three datanode receive data time all are 2 millisecond, so the sync method latency time should is 6 millisecond, but according to our our monitor, the the sync method latency time is 2 millisecond. How to calculate sync method latency time? Thanks, LiuLei
same edits file is loaded more than once
I am using hadoop0.20.2, now I want to use HDFS HA function. I research AvatarNode. I find if the StandbyNN do checkpoint fail, when next time the StandbyNN do checkpoint, the same edits file is loaded again. Can same edits file be loaded more than once in hadoop0.20.2? if not, what is the harm? Thanks, LiuLei
Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent
I want to know what applications are idempotent or not idempotent? and Why? Could you give me a example. Thank you 2012/10/29 Ted Dunning tdunn...@maprtech.com Create cannot be idempotent because of the problem of watches and sequential files. Similarly, mkdirs, rename and delete cannot generally be idempotent. In particular applications, you might find it is OK to treat them as such, but there are definitely applications where they are not idempotent. On Sun, Oct 28, 2012 at 2:40 AM, lei liu liulei...@gmail.com wrote: I think these methods should are idempotent, these methods should be repeated calls to be harmless by same client. Thanks, LiuLei
Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent
Hi Steve, Thank you for your detailed and patiently answered. I understand that. 2012/11/5 Steve Loughran ste...@hortonworks.com On 4 November 2012 17:25, lei liu liulei...@gmail.com wrote: I want to know what applications are idempotent or not idempotent? and Why? Could you give me a example. When you say idempotent, I presume you mean the operation happens at-most-once; ignoring the degenerate case where all requests are rejected. you can take operations that fail if their conditions aren't met (delete path named=something) being the simplest. the operation can send an error back file not found', but the client library can then downgrade that to an idempotent assertion: when the acknowledgment was send from the namenode, there was nothing at the end of this path. Which will hold on a replay, though if someone creates a file in between, that replay could be observable. Now what about move(src,dest)? if it succeeds, then there is no src path, as it is now at dest. What happens if you call it a second time? There is no src, only dest. You can't report that back as a success as it is clearly a failure: no src, no dest. It's hard to convert that into an assertion on the observable state of the system as the state doesn't reflect the history, so you need some temporal logic in there too:: at time t0 there existed a directory src, at time t1 the directory src no longer existed and its contents were now found under directory dest. And again, what happens if worse someone else did something in between, created a src directory (which it could do, given that the first one has been renamed dest), the operation replays and the move takes place twice -you've just crossed into at-least-once operations, which is not what you wanted. At this point I'm sure you are thinking of having some kind of transaction journal, recording that at time Tn, transaction Xn moved the dir. Which means you have to start to collect a transaction log of what happened. Now effectively HDFS is a journalled file system, it does record a lot of things. It just doesn't record user transactions with it, or rescan the log whenever any operation comes in, so as to decided what to ignore. Or you just skip the filesystem changes and have some data structure recording recent transaction IDs; ignore repeated requests with the same IDs. Better, though you'd need to make that failure resistant -it's state must propagate to the journal and any failover namenodes so that a transaction replay will be idempotent even if the filesystem fails over between the original and replayed transaction. And of course all of this needs to be atomic with the filesystem state changes... Summary: It gets complicated fast. Throwing errors back to the caller makes life a lot simpler and lets the caller choose its own outcome -even though that's not always satisfactory. Alternatively: it's not that people don't want globally distributed transactions -it's just hard. 2012/10/29 Ted Dunning tdunn...@maprtech.com Create cannot be idempotent because of the problem of watches and sequential files. Similarly, mkdirs, rename and delete cannot generally be idempotent. In particular applications, you might find it is OK to treat them as such, but there are definitely applications where they are not idempotent. On Sun, Oct 28, 2012 at 2:40 AM, lei liu liulei...@gmail.com wrote: I think these methods should are idempotent, these methods should be repeated calls to be harmless by same client. Thanks, LiuLei
ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent
I think these methods should are idempotent, these methods should be repeated calls to be harmless by same client. Thanks, LiuLei
Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent
Thanks Ted for your reply. What is the the problem of watches and sequential files? If you can describe in detail, I can better understand the problem. 2012/10/29 Ted Dunning tdunn...@maprtech.com Create cannot be idempotent because of the problem of watches and sequential files. Similarly, mkdirs, rename and delete cannot generally be idempotent. In particular applications, you might find it is OK to treat them as such, but there are definitely applications where they are not idempotent. On Sun, Oct 28, 2012 at 2:40 AM, lei liu liulei...@gmail.com wrote: I think these methods should are idempotent, these methods should be repeated calls to be harmless by same client. Thanks, LiuLei
Re: HDFS HA IO Fencing
I use NFS V4 to test the java FileLock. The 192.168.1.233 machine is NFS Server, the nfs configuration are /home/hdfs.ha/share 192.168.1.221(rw,sync,no_root_squash) /home/hdfs.ha/share 192.168.1.222(rw,sync,no_root_squash) in /etc/exports file. I run below commands to start nfs server: service nfs start service nfslock start The 192.168.1.221 and 192.168.1.222 machines are NFS Client, the nfs configuration is 192.168.1.223:/home/hdfs.ha/share /home/hdfs.ha/share nfs rsize=8192,wsize=8192,timeo=14,intr in /etc/fstab file. I run below commands to start nfs client: service nfs start service nfslock start I write one programm to receive file lock: public class FileLockTest { FileLock lock; public void lock(String path,boolean isShare) throws IOException { this.lock = tryLock(path,isShare); if (lock == null) { String msg = Cannot lock storage + path + . The directory is already locked.; System.out.println(msg); throw new IOException(msg); } } private FileLock tryLock(String path,boolean isShare) throws IOException { boolean deletionHookAdded = false; File lockF = new File(path); if (!lockF.exists()) { lockF.deleteOnExit(); deletionHookAdded = true; } RandomAccessFile file = new RandomAccessFile(lockF, rws); FileLock res = null; try { res = file.getChannel().tryLock(0,Long.MAX_VALUE,isShare); } catch (OverlappingFileLockException oe) { file.close(); return null; } catch (IOException e) { e.printStackTrace(); file.close(); throw e; } if (res != null !deletionHookAdded) { // If the file existed prior to our startup, we didn't // call deleteOnExit above. But since we successfully locked // the dir, we can take care of cleaning it up. lockF.deleteOnExit(); } return res; } public static void main(String[] s) { FileLockTest fileLockTest =new FileLockTest(); try { fileLockTest.lock(s[0], Boolean.valueOf(s[1])); Thread.sleep(1000*60*60*1); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } } I do two test cases. 1. The network is OK I run java -cp ./filelock.jar lock.FileLockTest /home/hdfs.ha/share/test.lock false command in 192.168.1.221 to hold file lock, and then I run same command to hold same file lock in 192.168.1.222, throw below exception: Cannot lock storage /home/hdfs.ha/share/test.lock. The directory is already locked. java.io.IOException: Cannot lock storage /home/hdfs.ha/share/test.lock. The directory is already locked. at lock.FileLockTest.lock(FileLockTest.java:18) at lock.FileLockTest.main(FileLockTest.java:53) 2. machine which hold file lock is diconnected I run java -cp ./filelock.jar lock.FileLockTest /home/hdfs.ha/share/test.lock false command on 192.168.1.221, then 192.168.1.221 machine is disconnected from network . After three minutes , I run the java -cp ./filelock.jar lock.FileLockTest /home/hdfs.ha/share/test.lock false command on 192.168.1.222, that can hold the file lock. I use mount | grep nfs command to examine the mount nfs directory on 192.168.1.221, the share directory /home/hdfs.ha/share/ is disappear on 192.168.1.221 machine. So I think when the machine is disconnected for a long time, other machine can receive the same file lock.
Re: HDFS HA IO Fencing
We are using NFS for Shared storage, Can we use linux nfslcok service to implement IO Fencing ? 2012/10/26 Steve Loughran ste...@hortonworks.com On 25 October 2012 14:08, Todd Lipcon t...@cloudera.com wrote: Hi Liu, Locks are not sufficient, because there is no way to enforce a lock in a distributed system without unbounded blocking. What you might be referring to is a lease, but leases are still problematic unless you can put bounds on the speed with which clocks progress on different machines, _and_ have strict guarantees on the way each node's scheduler works. With Linux and Java, the latter is tough. on any OS running in any virtual environment, including EC2, time is entirely unpredictable, just to make things worse. On a single machine you can use file locking as the OS will know that the process is dead and closes the file; other programs can attempt to open the same file with exclusive locking -and, by getting the right failures, know that something else has the file, hence the other process is live. Shared NFS storage you need to mount with softlock set precisely to stop file locks lasting until some lease has expired, because the on-host liveness probes detect failure faster and want to react to it. -Steve
[no subject]
http://blog.csdn.net/onlyqi/article/details/6544989 https://issues.apache.org/jira/browse/HDFS-2185 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html http://blog.csdn.net/chenpingbupt/article/details/7922042 https://issues.apache.org/jira/browse/HADOOP-8163
use DistributedCache to add many files to class path
I use DistributedCache to add two files to class path, exampe below code : String jeJarPath = /group/aladdin/lib/je-4.1.7.jar; DistributedCache.addFileToClassPath(new Path(jeJarPath), conf); String tairJarPath = /group/aladdin/lib/tair-aladdin-2.3.1.jar DistributedCache.addFileToClassPath(new Path(jeJarPath), conf); when map/reduce is executing, the /group/aladdin/lib/tair-aladdin-2.3.1.jar file is added to class path, but the /group/aladdin/lib/je-4.1.7.jar file is not added to class path. How can I add many files to class path? Thanks, LiuLei
create local file in tasktracker node
I want to use hadoop to create Berkeley DB index, so I need create one directory to store Berkeley DB index, There are below code in reduce : String tmp = job.get(hadoop.tmp.dir); String shardName = shard + this.shardNum + _ + UUID.randomUUID().toString(); this.localIndexFile = new File(tmp, shardName); if (!localIndexFile.exists()) { boolean isSuccessfull = localIndexFile.mkdir(); LOG.info(create directory + this.localIndexFile + : + isSuccessfull); } but the localIndexFile.mkdir() method return false, could everyone tell me why the method return false, whether my reduce task instance don't have the permission? Thanks, LiuLei
Dose one map instance only handle one input path at the same time?
There are two input direcoties:/user/test1/ and /user/test2/ , I want to join the two direcoties content, in order to join the two directories, I need to identity the content are handled by mapper from which directory, so I use below code in mapper: private int tag = -1; @Override public void configure(JobConf conf) { try { this.conf = conf; String pathsToAliasStr = conf.get(paths.to.alias);//example: conf.set(paths.to.alias, 0=/user/test1/,1=/user/test2/ String[] pathsToAlias = pathsToAliasStr.split(,); Path fpath = new Path((new Path(conf.get(map.input.file ))).toUri().getPath()); String path = fpath.toUri().toString(); for (int i = 0; i pathsToAlias.length; i++) { String[] pathToAlias = pathsToAlias[i].split(=); if (path.startsWith(pathToAlias[1])) { tag = Integer.valueOf(pathToAlias[0].trim());//identity current map instatnce are handling which directory content. } } } catch (Throwable e) { e.printStackTrace(); throw new RuntimeException(e); } } So when map method run, the content are handled by the mapper are identified for same direcoty. I want to know whether one mapper instatnce only handle content of one directory at same time. Thanks LiuLei
how does hadoop handle the counter of the failed task and speculative task
I define the counter to count the bad records, there is below code in map task; reporter.incrCounter(bad', records', 1), When the job is completed, the pritnt the result to use below code: long total = counters.findCounter(bad,records).getCounter(); But I have two questions about the counter: 1. If the map task is retried 4 times, and the last map task is successful, I think the counter of the other three map tasks should be not included in ultima result, is that right? 2. If there are speculative tasks, I think the counter of speculative tasks should be not included in ultima result, is that right? Thanks, LiuLei
Virtual Columns error
I use hive0.6 version and execute 'select INPUT_FILE_NAME, BLOCK_OFFSET_INSIDE_FILE from person1' statement, hive0.6 throws below error: FAILED: Error in semantic analysis: line 1:7 Invalid Table Alias or Column Reference INPUT_FILE_NAME error. Don't hive0.6 support virtual columns?
how to create index on one table
I use hive0.6 ,I want to create index on one table, how can I do ti?
Re: how to export create statement for one table
I know the describe statement, the statement don't display the FIELDS TERMINATED and LINES TERMINATED, it only display column name and column type. 2010/9/19 Ted Yu yuzhih...@gmail.com See bottom of http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL On Sat, Sep 18, 2010 at 7:13 PM, lei liu liulei...@gmail.com wrote: I use below statement to create one table: CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS SEQUENCEFILE; Now I want to export the DDL for page_view table, how can I do it ?
add partition
I use below statement to create one tabale and add one partition: create external table test(userid bigint,name string, age int) partitioned by(pt string); alter table test add partition(pt='01'); Now there is one file in HDFS, the file path is /user/hive/warehouse/user, I use load statement to load the file to partition: load data inpath '/user/hive/warehouse/user' into table test partition(pt='01'). I find the file path is changed, form /user/hive/warehouse/user to /user/hive/warehouse/test/pt=01. I want to do't change the file path, how can I do it?
hwo to connection metastore server with hive JDBC client
I use ./hive --service metastore command to start metastore server, how to connection metastore server with hive JDBC client?
hwo to expore DDL statement for one table
I use below statement to create one table: CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS SEQUENCEFILE; Now I want to expore the DDL for page_view, how can I do it ?
how to export create statement for one table
I use below statement to create one table: CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS SEQUENCEFILE; Now I want to export the DDL for page_view table, how can I do it ?
GroupByOperator class confuse , it will result in out of memeory
I find GroupByOperator cache the Aggregation results of different keys. Please look below cod: AggregationBuffer[] aggs = null; boolean newEntryForHashAggr = false; keyProber.hashcode = newKeys.hashCode(); // use this to probe the hashmap keyProber.keys = newKeys; // hash-based aggregations aggs = hashAggregations.get(keyProber); ArrayListObject newDefaultKeys = null; if (aggs == null) { newDefaultKeys = deepCopyElements(keyObjects, keyObjectInspectors, ObjectInspectorCopyOption.WRITABLE); KeyWrapper newKeyProber = new KeyWrapper(keyProber.hashcode, newDefaultKeys, true); aggs = newAggregations(); hashAggregations.put(newKeyProber, aggs); newEntryForHashAggr = true; numRowsHashTbl++; // new entry in the hash table } When there are 100 difference keys, and the value is 10k of each key, that will occupy 10G memeory, the JVM will out of memeory. Could anybody tell me how to handle the question? Thanks, LiuLei
hive-0.6 don't connection mysql in metastore
I use hive-0.6 an use mysql as metasore, but hive don't connection the mysql. 2010-08-30 13:28:24,982 ERROR [main] util.Log4JLogger(125): Failed initialising database. Invalid URL: jdbc:mysql://127.0.0.1:3306/hive6?createDatabaseIfNotExist=true org.datanucleus.exceptions.NucleusDataStoreException: Invalid URL: jdbc:mysql://127.0.0.1:3306/hive6?createDatabaseIfNotExist=true at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:536) at org.datanucleus.store.rdbms.RDBMSStoreManager.init(RDBMSStoreManager.java:290) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:588) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:300) at org.datanucleus.ObjectManagerFactoryImpl.initialiseStoreManager(ObjectManagerFactoryImpl.java:161) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:583) at org.datanucleus.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:286) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1953) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:153) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:128) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:54) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:276) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:228) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:374) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:166) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:125) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.init(HiveServer.java:79) at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:85) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110) at java.sql.DriverManager.getConnection(Unknown Source) at java.sql.DriverManager.getConnection(Unknown Source) at my.examples.multithreadquery.SimpleSql.main(SimpleSql.java:21) Caused by: java.sql.SQLException: Invalid URL: jdbc:mysql:// 127.0.0.1:3306/hive6?createDatabaseIfNotExist=true at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:76) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110) at java.sql.DriverManager.getConnection(Unknown Source) at java.sql.DriverManager.getConnection(Unknown Source) at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:75) at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:521) ... 38 more I find hive0.6 to connection mysql with org.apache.hadoop.hive.jdbc.HiveDriver. I think that is wrong, it should use com.mysql.jdbc.Driver to connection mysql. Below is my conifguration: property namejavax.jdo.option.ConnectionURL/name valuejdbc:mysql://127.0.0.1:3306/hive6?createDatabaseIfNotExist=true /value descriptionJDBC connect string for a JDBC
hwo to hive add hive_exec.jar to hadoop
When hadoop one job which is submmited by hive, hadoop need the hive_exec.jar, hwo to hive add hive_exec.jar to hadoop? Please tell me the where are codes in hive. Thanks, LiuLei
Re: java.sql.SQLException: org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 1 bytes, but only got 0 bytes.
Yes, you are right. I do that, but after the hive server run several days, when client connection the hive server, the client receive the exception. 2010/8/23 Adarsh Sharma adarsh.sha...@orkash.com For Running Hive in Server Mode .. First U have to start service of hiveserver :: *$bin/hive --service hiveserver * and then run the code lei liu wrote: Hello everyone, I use JDBC to connection the hive server, sometime I receive below exception: java.sql.SQLException: org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 1 bytes, but only got 0 bytes. Please tell me the eason. Thanks LiuLei
Re: Re: how to support chinese in hive
Hi shangan, You need to set linux coding is UTF-8. 2010/8/16 shangan shan...@corp.kaixin001.com the fact is that even I hava data in UTF-8 using simplified Chinese, then doing a select * it will return an unreadable result. Does that mean hive can only support ascii character ? 2010-08-16 -- shangan -- *发件人:* Jeff Hammerbacher *发送时间:* 2010-08-15 07:19:27 *收件人:* hive-user *抄送:* *主题:* Re: how to support chinese in hive Hey shangan, There's a ticket open to make Hive work with non-UTF-8 codecs at https://issues.apache.org/jira/browse/hive-1505. Perhaps you could add more about your needs there? Later, Jeff On Fri, Aug 13, 2010 at 4:02 AM, shangan shan...@corp.kaixin001.comwrote: hi,all Could anyone tell me how to configurate hive in order to support Chinese characters ? And when using hwi,how to configure directory of the result file, by default now it is the 'conf' directory under my installation path. 2010-08-13 -- shangan
Re: what is difference hive local model and standalone model.
You can look the http://wiki.apache.org/hadoop/Hive/HiveClient page. For local mode the uri is jdbc:hive://, for standalone mode the uri is jdbc:hive://host:port/dbname. when we use local mode, my application and hive server run in same VM, so we don't need to maintain hive server. I think that is advantage to local mode. I want to know what is disadvantage when we use local mode. 2010/8/14 Joydeep Sen Sarma jssa...@facebook.com Lei – not sure I understand the question. I tried to document the relationship between hive, MR and local-mode at http://wiki.apache.org/hadoop/Hive/GettingStarted#Hive.2C_Map-Reduce_and_Local-Moderecently. perhaps you have already read it. Regarding whether local mode can be run on windows or not – I really don’t know. First of all – hadoop has to be runnable in local mode on windows (using cygwin I presume?). then one has to test hive against this – one would think it should work if hadoop does – but we would have to verify. (ie. yes – it should be possible in theory – but in practice – there are probably bugs that need to get sorted out for this to happen). -- *From:* lei liu [mailto:liulei...@gmail.com] *Sent:* Friday, August 13, 2010 9:10 AM *To:* hive-user@hadoop.apache.org *Subject:* what is difference hive local model and standalone model. what is difference hive local model and standalone model. Can the hive local model be ran in windows?
what is difference hive local model and standalone model.
what is difference hive local model and standalone model. Can the hive local model be ran in windows?
Re: Hwo to use JDBC client embedded mode
Thank you for your reply. I have looked http://wiki.apache.org/hadoop/Hive/HiveClient#JDBC page before. what is mean the embedded mode mentioned in the page? Is that hive embedded mode? I mean that I don't need to start hive, the hive server can be embedded to my application, my application don't need connection to access the hive server. 2010/8/11 Bill Graham billgra...@gmail.com The code and start script shown in this section of the wiki shows how to run hive in embedded mode. http://wiki.apache.org/hadoop/Hive/HiveClient#JDBC Compile the code after changing the JDBC URI to 'jdbc:hive://' and run the example script. This will run the code, which will start Hive in embedded mode, create a table, do some operations on it, and then drop it. On Tue, Aug 10, 2010 at 8:05 AM, lei liu liulei...@gmail.com wrote: Can anybody answer the question? Thanks, LiuLei 2010/8/10 lei liu liulei...@gmail.com I look see below content in http://wiki.apache.org/hadoop/Hive/HiveClientpage: For embedded mode, uri is just jdbc:hive://. How can I use JDBC client embedded mode? Could anybody give me an example?
Re: How to merge small files
Thank you for your reply. Could you tell me why it is slower if the two paremeters are true and how slow it is? 2010/8/10 Namit Jain nj...@facebook.com Yes, it will try to run another map-reduce job to merge the files From: lei liu [liulei...@gmail.com] Sent: Monday, August 09, 2010 8:57 AM To: hive-user@hadoop.apache.org Subject: Re: How to merge small files Could you tell me whether the query is slower if I two parameters both are true? 2010/8/9 Namit Jain nj...@facebook.commailto:nj...@facebook.com That's right From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com] Sent: Sunday, August 08, 2010 7:18 PM To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org Subject: Re: How to merge small files Thank you for your reply. Your mean is I will execute below statement: statement.execute(set hive.merge.mapfiles=true); statement.execute(set hive.merge.mapredfiles=true); The two parementers are both true, right? 2010/8/6 Namit Jain nj...@facebook.commailto:nj...@facebook.commailto: nj...@facebook.commailto:nj...@facebook.com HIVEMERGEMAPFILES(hive.merge.mapfiles, true), HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false), Set the above parameters to true before your query. From: lei liu [liulei...@gmail.commailto:liulei...@gmail.commailto: liulei...@gmail.commailto:liulei...@gmail.com] Sent: Thursday, August 05, 2010 8:47 PM To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org mailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org Subject: How to merge small files When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop, How can I merge these small files? Thanks, LiuLei
how to call the UDF/UDAF in hive
Hello everyone, Could everybody tell me how to call UDF/UDAF in hive?
Re: How to merge small files
Could you tell me whether the query is slower if I two parameters both are true? 2010/8/9 Namit Jain nj...@facebook.com That's right From: lei liu [liulei...@gmail.com] Sent: Sunday, August 08, 2010 7:18 PM To: hive-user@hadoop.apache.org Subject: Re: How to merge small files Thank you for your reply. Your mean is I will execute below statement: statement.execute(set hive.merge.mapfiles=true); statement.execute(set hive.merge.mapredfiles=true); The two parementers are both true, right? 2010/8/6 Namit Jain nj...@facebook.commailto:nj...@facebook.com HIVEMERGEMAPFILES(hive.merge.mapfiles, true), HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false), Set the above parameters to true before your query. From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com] Sent: Thursday, August 05, 2010 8:47 PM To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org Subject: How to merge small files When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop, How can I merge these small files? Thanks, LiuLei
Hwo to use JDBC client embedded mode
I look see below content in http://wiki.apache.org/hadoop/Hive/HiveClientpage: For embedded mode, uri is just jdbc:hive://. How can I use JDBC client embedded mode? Could anybody give me an example?
Re: How to merge small files
Thank you for your reply. Your mean is I will execute below statement: statement.execute(set hive.merge.mapfiles=true); statement.execute(set hive.merge.mapredfiles=true); The two parementers are both true, right? 2010/8/6 Namit Jain nj...@facebook.com HIVEMERGEMAPFILES(hive.merge.mapfiles, true), HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false), Set the above parameters to true before your query. From: lei liu [liulei...@gmail.com] Sent: Thursday, August 05, 2010 8:47 PM To: hive-user@hadoop.apache.org Subject: How to merge small files When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop, How can I merge these small files? Thanks, LiuLei
JDBC embedded mode
How can I use the embedded mode of JDBC, could anybody give me an example?
how to debug code in org.apache.hadoop.hive.ql.exec package
how can I debug code in org.apache.hadoop.hive.ql.exec package?
Re: why is slow when use OR clause instead of IN clause
When there are one thousand OR clause, the hive appear below exception: Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator java.lang.StackOverflowError at java.beans.Statement.init(Statement.java:60) at java.beans.Expression.init(Expression.java:47) at java.beans.Expression.init(Expression.java:65) at java.beans.PrimitivePersistenceDelegate.instantiate(MetaData.java:79) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:97) at java.beans.Encoder.writeObject(Encoder.java:54) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:257) at java.beans.Encoder.writeObject1(Encoder.java:206) at java.beans.Encoder.cloneStatement(Encoder.java:219) at java.beans.Encoder.writeExpression(Encoder.java:278) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:372) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:97) at java.beans.Encoder.writeObject(Encoder.java:54) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:257) at java.beans.Encoder.writeObject1(Encoder.java:206) at java.beans.Encoder.cloneStatement(Encoder.java:219) at java.beans.Encoder.writeExpression(Encoder.java:278) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:372) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:97) at java.beans.Encoder.writeObject(Encoder.java:54) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:257) at java.beans.Encoder.writeExpression(Encoder.java:279) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:372) at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:212) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:247) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:395) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:100). When there are two hundred OR clause, it is very very slow. Now I use 0.4.1 version, if I upgrade to 0.6 version, which things I need to do? In addition, when is the 0.6 version is released? Thanks, LiuLei 2010/8/5 Ning Zhang nzh...@facebook.com I tested (1000 disjunctions) and it was extremely slow but no OOM. The issue seems to be the fact that we serialize the plan by writing to HDFS file directly. We probably should cache it locally and then write it to HDFS. On Aug 4, 2010, at 10:23 AM, Edward Capriolo wrote: On Wed, Aug 4, 2010 at 1:15 PM, Ning Zhang nzh...@facebook.com wrote: Currently an expression tree (series of ORs in this case) is not collapsed to one operator or any other optimizations. It would be great to have this optimization rule to convert an OR operator tree to one IN operator. Would you be able to file a JIRA and contribute a patch? On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote: I haven't looked at the code, but I assume the query parser would sort the 'in' terms and then do a binary search lookup into them for each row, while the 'or' terms don't have that kind of obvious relationship and are probably tested in sequence. This would give the in O(log N) performance compared to a chain of or's having O(N) performance, per row queried. For large N, that could add up. That being said, I'm just speculating here. The query parser may be smart enough to optimize the related or's in the same way, or it may not optimize that at all. If I get a chance, I'll try to dig around and see what it's doing, as I have also had a lot of large 'in' queries and could use every drop of performance I can get. --Mark On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Wed, Aug 4, 2010 at 6:10 AM, lei liu liulei...@gmail.com wrote: Because my company reuire we use 0.4.1 version, the version don't support IN clause. I want to use the OR clause(example:where id=1 or id=2 or id=3) to implement the IN clause(example: id in(1,2,3) ). I know it will be slower especially when the list after in is very long. Could anybody can tell me why is slow when use OR clause to implement In clause? Thanks, LiuLei I can not imagine the performance difference between 'or' or 'in' would be that great but I never benchmarked it. The big looming problems is that if you string enough 'or' together (say 8000) the query parser which uses java beans serialization will OOM. Edward For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go
hwo to debug hive and hadoop
I have used 'Remote Java Application' in eclipse to debug hive code, now I want to debug hive and hadoop together, how can I do it? Thanks, LiuLei
How to merge small files
When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop, How can I merge these small files? Thanks, LiuLei
why is slow when use OR clause instead of IN clause
Because my company reuire we use 0.4.1 version, the version don't support IN clause. I want to use the OR clause(example:where id=1 or id=2 or id=3) to implement the IN clause(example: id in(1,2,3) ). I know it will be slower especially when the list after in is very long. Could anybody can tell me why is slow when use OR clause to implement In clause? Thanks, LiuLei
Re: why is slow when use OR clause instead of IN clause
Hello Edward Capriolo, Thank you for your reply. Are you sure that if you string enough 'or' together (say 8000) the query parser which uses java beans serialization will OOM? How many memory you assign to hive? 2010/8/4 Edward Capriolo edlinuxg...@gmail.com On Wed, Aug 4, 2010 at 6:10 AM, lei liu liulei...@gmail.com wrote: Because my company reuire we use 0.4.1 version, the version don't support IN clause. I want to use the OR clause(example:where id=1 or id=2 or id=3) to implement the IN clause(example: id in(1,2,3) ). I know it will be slower especially when the list after in is very long. Could anybody can tell me why is slow when use OR clause to implement In clause? Thanks, LiuLei I can not imagine the performance difference between 'or' or 'in' would be that great but I never benchmarked it. The big looming problems is that if you string enough 'or' together (say 8000) the query parser which uses java beans serialization will OOM. Edward
Re: why is slow when use OR clause instead of IN clause
Now I assign 100M memory to hive, you consider that can support how many 'OR' string? 2010/8/5 Edward Capriolo edlinuxg...@gmail.com On Wed, Aug 4, 2010 at 12:15 PM, lei liu liulei...@gmail.com wrote: Hello Edward Capriolo, Thank you for your reply. Are you sure that if you string enough 'or' together (say 8000) the query parser which uses java beans serialization will OOM? How many memory you assign to hive? 2010/8/4 Edward Capriolo edlinuxg...@gmail.com On Wed, Aug 4, 2010 at 6:10 AM, lei liu liulei...@gmail.com wrote: Because my company reuire we use 0.4.1 version, the version don't support IN clause. I want to use the OR clause(example:where id=1 or id=2 or id=3) to implement the IN clause(example: id in(1,2,3) ). I know it will be slower especially when the list after in is very long. Could anybody can tell me why is slow when use OR clause to implement In clause? Thanks, LiuLei I can not imagine the performance difference between 'or' or 'in' would be that great but I never benchmarked it. The big looming problems is that if you string enough 'or' together (say 8000) the query parser which uses java beans serialization will OOM. Edward That is exactly what I am saying. I tested with 4GB and 8GB. I am not exactly sure how many OR's you can get away with for your memory size, but some upper limit exists currently. Most people never hit it. (I did because my middle name is edge case )