How to use patch-13090 in hbase-0.98.6.1

2015-07-24 Thread Song Geng
Hi,

I am a novice for hbase. Now I am trying to figure out an issue which is about 
client scan timeout after delete. Basically, the reason is there are too many 
continuous delete flags before major compaction.

Fortunately, I found patch-13090 could fix client scan timeout issue. But we 
use hbase-0.98.6.1, there is too much difference need to be ported.

Anyone can help me. Thanks.

Br, Great Soul
soul.gr...@me.com







JvmPauseMonitor

2015-07-24 Thread jeevi tesh
Hi,
I'm getting following error
util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause
of  approximately after this hbase gets hanged. if you have any solution
please let me know.
Changed to Grabagecollector mode in HBASE to G1c1 still error persists.
I'm working with hbase 0.96.2 and single node. hbase is installed in a
single machine. JDK1.7.
If you have any solution kindly let me know
with regards
jeevitesh


Re: JvmPauseMonitor

2015-07-24 Thread jeevi tesh
Correction G1GC

On Fri, Jul 24, 2015 at 3:13 PM, jeevi tesh  wrote:

> Hi,
> I'm getting following error
> util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause
> of  approximately after this hbase gets hanged. if you have any solution
> please let me know.
> Changed to Grabagecollector mode in HBASE to G1c1 still error persists.
> I'm working with hbase 0.96.2 and single node. hbase is installed in a
> single machine. JDK1.7.
> If you have any solution kindly let me know
> with regards
> jeevitesh
>


Re: JvmPauseMonitor

2015-07-24 Thread jeevi tesh
I'm aware Gc is a common even i have checked in logs several times it has
happened but few time it happens i also get this following error
"java.io.IOException: Connection reset by peer"
After this error hbase system crashes.

On Fri, Jul 24, 2015 at 3:14 PM, jeevi tesh  wrote:

> Correction G1GC
>
> On Fri, Jul 24, 2015 at 3:13 PM, jeevi tesh 
> wrote:
>
>> Hi,
>> I'm getting following error
>> util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC):
>> pause of  approximately after this hbase gets hanged. if you have any
>> solution please let me know.
>> Changed to Grabagecollector mode in HBASE to G1c1 still error persists.
>> I'm working with hbase 0.96.2 and single node. hbase is installed in a
>> single machine. JDK1.7.
>> If you have any solution kindly let me know
>> with regards
>> jeevitesh
>>
>
>


Re: JvmPauseMonitor

2015-07-24 Thread jeevi tesh
I'm aware Gc is a common event i have checked in logs several times it has
happened before. But few time when it happens i also get this following
error "java.io.IOException: Connection reset by peer"
After this error hbase system crashes.

On Fri, Jul 24, 2015 at 3:52 PM, jeevi tesh  wrote:

> I'm aware Gc is a common even i have checked in logs several times it has
> happened but few time it happens i also get this following error
> "java.io.IOException: Connection reset by peer"
> After this error hbase system crashes.
>
> On Fri, Jul 24, 2015 at 3:14 PM, jeevi tesh 
> wrote:
>
>> Correction G1GC
>>
>> On Fri, Jul 24, 2015 at 3:13 PM, jeevi tesh 
>> wrote:
>>
>>> Hi,
>>> I'm getting following error
>>> util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC):
>>> pause of  approximately after this hbase gets hanged. if you have any
>>> solution please let me know.
>>> Changed to Grabagecollector mode in HBASE to G1c1 still error persists.
>>> I'm working with hbase 0.96.2 and single node. hbase is installed in a
>>> single machine. JDK1.7.
>>> If you have any solution kindly let me know
>>> with regards
>>> jeevitesh
>>>
>>
>>
>


Re[2]: region servers stuck

2015-07-24 Thread Ted Yu

Is it possible for you to upgrade to 0.98.10+ ?

I will take a look at your logs later. 

Thanks

Friday, July 24, 2015, 7:15 PM +0800 from Konstantin Chudinov  
:
>Hello Ted,
>Thank you for your answer!
>Hadoop and HBase versions are: 
>2.3.0-cdh5.1.0 - версия хадупа (и hdfs)
>hbase-0.98.1
>About hdfs.. i don’t see anything special in the logs. I’ve attached them to 
>this message. Btw, it’s another server, which is also crashed (I’ve lost hdfs 
>logs of previous server), so hbase logs are in archive as well.
>
>Best regards,
>
>Konstantin Chudinov
>
>On 23 Jul 2015, at 20:44, Ted Yu < yuzhih...@gmail.com > wrote:
>>
>>What release of HBase do you use ?
>>
>>I looked at the two log files but didn't find such information. 
>>In the log for node 118, I saw something such as the following:
>>Failed to connect to /10.0.229.16:50010 for block, add to deadNodes and 
>>continue 
>>
>>Was hdfs healthy around the time region server got stuck ?
>>
>>Cheers
>>
>>
>>Friday, July 24, 2015, 12:21 AM +0800 from Konstantin Chudinov  < 
>>kchudi...@griddynamics.com >:
>>>Hi all,
>>>Our team faced cascading server's stuck. RS logs are similar to that in 
>>>HBASE-10499 (  https://issues.apache.org/jira/browse/HBASE-10499 ) except 
>>>there is no RegionTooBusyException before flush loop:
>>>2015-07-19 07:32:41,961 INFO org.apache.hadoop.hbase.regionserver.HStore: 
>>>Completed major compaction of 2 file(s) in s of table4,\xC7 
>>>,1390920313296.9f554d5828cfa9689de27c1a42d844e3. into 
>>>65dae45c82264b4d80fc7ed0818a4094(size=1.2 M), total size for store is 1.2 M. 
>>>This selection was in queue for 0sec, and took 0sec to execute.
>>>2015-07-19 07:32:41,961 INFO 
>>>org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed 
>>>compaction: Request = regionName=table4,\xC7 
>>>,1390920313296.9f554d5828cfa9689de27c1a42d844e3., storeName=s, fileCount=2, 
>>>fileSize=1.2 M, priority=998, time=24425664829680753; duration=0sec
>>>2015-07-19 07:32:41,962 INFO 
>>>org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy: 
>>>Default compaction algorithm has selected 1 files from 1 candidates
>>>2015-07-19 07:32:44,764 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 18943
>>>2015-07-19 07:32:54,765 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 4851
>>>2015-07-19 07:33:04,764 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 7466
>>>2015-07-19 07:33:14,764 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 4940
>>>2015-07-19 07:33:24,765 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 12909
>>>2015-07-19 07:33:34,764 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 5897
>>>2015-07-19 07:33:44,764 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 9110
>>>2015-07-19 07:33:54,764 INFO 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer: 
>>>regionserver60020.periodicFlusher requesting flush for region 
>>>webpage_table,5100,1432632712750.5d3471db423cb08f9ed294c4f3094825. after 
>>>a delay of 7109
>>>
>>>until we've rebooted RS at 10:08.
>>>8 servers got in stuck at the same time.
>>>I haven't found anything in hmaster's logs. Thread dumps shows, that many 
>>>theads (and flush thread) are waiting for read lock during access to HDFS:
>>>"RpcServer.handler=19,port=60020" - Thread t@90
>>>  java.lang.Thread.State: WAITING
>>>at java.lang.Object.wait(Native Method)
>>>- waiting on <0184> (a org.apache.hadoop.hbase.util.IdLock$Entry)
>>>at java.lang.Object.wait(Object.java:503)
>>>at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
>>>at 
>>>org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:319)
>>>at 
>>>org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
>>>at 
>>>org.apache.hadoop.hbase.io.hfile.H

Re: Re[2]: region servers stuck

2015-07-24 Thread Serega Sheypak
probably block was being replicated because of DN failure and HBase was
trying to access that replica and got stuck?
I can see that DN answers that some blocks are missing.
or maybe you run HDFS-balancer?

The other thing is that you should always get read access to HDFS by
design, you are not allowed to modify file concurrently, first writer gets
lease on block and NN doesn't allow to get concurrent leases as I remember
it correctly...

See what happens with block 1099777976128

RS:
015-07-19 07:25:08,533 INFO org.apache.hadoop.hbase.regionserver.HStore:
Starting compaction of 2 file(s) in i of
table7,\x8C\xA0,1435936455217.12a2d1e37fd8f0f9870fc1b5afd6046d. into
tmpdir=hdfs://server1/hbase/data/default/table7/12a2d1e37fd8f0f9870fc1b5afd6046d/.tmp,
totalSize=416.0 M
2015-07-19 07:25:08,556 WARN org.apache.hadoop.hdfs.BlockReaderFactory:
BlockReaderFactory(fileName=/hbase/data/default/table7/12a2d1e37fd8f0f9870fc1b5afd6046d/i/983cf03fddfa480f92346f25a61b0b9e,
block=BP-1892992341-10.10.122.111-1352825964285:blk_1195579097_1099777976128):
unknown response code ERROR while attempting to set up short-circuit
access. Block
BP-1892992341-10.10.122.111-1352825964285:blk_1195579097_1099777976128 is
not valid
2015-07-19 07:25:08,556 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x6b1f04e2): failed to load
1195579097_BP-1892992341-10.10.122.111-1352825964285
2015-07-19 07:25:08,557 WARN org.apache.hadoop.hdfs.BlockReaderFactory: I/O
error constructing remote block reader.
java.io.IOException: Got error for OP_READ_BLOCK, self=/10.0.241.39:53420,
remote=/10.0.241.39:50010, for file
/hbase/data/default/table7/12a2d1e37fd8f0f9870fc1b5afd6046d/i/983cf03fddfa480f92346f25a61b0b9e,
for pool BP-1892992341-10.10.122.111-1352825964285 block
1195579097_1099777976128
at
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:432)
at
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:397)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:789)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1210)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1483)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.seekTo(HFileReaderV2.java:1052)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:244)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:240)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:202)
at
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:257)
at
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1080)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1482)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-07-19 07:25:08,558 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.0.241.39:50010 for block, add to deadNodes and continue.
java.io.IOException: Got error for OP_READ_BLOCK, self=/10.0.241.39:53420,
remote=/10.0.241.39:50010, for file
/hbase/data/default/table7/12a2d1e37fd8f0f9870fc1b5afd6046d/i/983cf03fddfa480f92346f25a61b0b9e,
for pool BP-1892992341-10.10.122.111-1352825964285 block
1195579097_1099777976128
java.io.IOException: Got error for OP_READ_BLOCK, self=/10.0.241.39:53420,
remote=/10.0.241.39:50010, for file
/hbase/data/default/table7/12a2d1e37fd8f0f9870fc1b5afd6046d/i/983cf03fddfa480f92346f25a61b0b9e,
for pool BP-1892992341-10.10.122.111-1352825964285 block
11955

Re: JvmPauseMonitor

2015-07-24 Thread Vladimir Rodionov
Hi, jeevi

Is there any reason you are testing ancient, unsupported version of HBase?

-Vlad

On Fri, Jul 24, 2015 at 3:23 AM, jeevi tesh  wrote:

> I'm aware Gc is a common event i have checked in logs several times it has
> happened before. But few time when it happens i also get this following
> error "java.io.IOException: Connection reset by peer"
> After this error hbase system crashes.
>
> On Fri, Jul 24, 2015 at 3:52 PM, jeevi tesh 
> wrote:
>
> > I'm aware Gc is a common even i have checked in logs several times it has
> > happened but few time it happens i also get this following error
> > "java.io.IOException: Connection reset by peer"
> > After this error hbase system crashes.
> >
> > On Fri, Jul 24, 2015 at 3:14 PM, jeevi tesh 
> > wrote:
> >
> >> Correction G1GC
> >>
> >> On Fri, Jul 24, 2015 at 3:13 PM, jeevi tesh 
> >> wrote:
> >>
> >>> Hi,
> >>> I'm getting following error
> >>> util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC):
> >>> pause of  approximately after this hbase gets hanged. if you have any
> >>> solution please let me know.
> >>> Changed to Grabagecollector mode in HBASE to G1c1 still error persists.
> >>> I'm working with hbase 0.96.2 and single node. hbase is installed in a
> >>> single machine. JDK1.7.
> >>> If you have any solution kindly let me know
> >>> with regards
> >>> jeevitesh
> >>>
> >>
> >>
> >
>


Hbase major compaction question

2015-07-24 Thread apratim sharma
I have a hbase table with with a wide row almost 2K columns per row. Each
KV size is approx 2.1KB
I have populated this table with generated hfiles using a MR job.
There are no write or mutate operations performed on this table.

So Once I am done with major compaction on this table, ideally we should
not require another major or minor compaction if table is not modified.
What I observe is that if I make some configuration change that need to
restart my hbase service, then after restart my compaction on the table is
gone.
And if I start major compaction on the table again, It takes again a long
to compact the table.

Is this expected behavior? I am curious what causes the major compaction to
take a long time if nothing has changed on the table.


I would really appreciate any help.


Thanks

Apratim


Re: Hbase major compaction question

2015-07-24 Thread Ted Yu

Can you provide us more information:
Release of HBase you use
Configuration change you made prior to restarting 
By 'compaction is gone', do you mean that locality became poor again ?

Can you pastebin region server log when compaction got stuck ?

Thanks

Saturday, July 25, 2015, 2:20 AM +0800 from apratim sharma  
:
>I have a hbase table with with a wide row almost 2K columns per row. Each
>KV size is approx 2.1KB
>I have populated this table with generated hfiles using a MR job.
>There are no write or mutate operations performed on this table.
>
>So Once I am done with major compaction on this table, ideally we should
>not require another major or minor compaction if table is not modified.
>What I observe is that if I make some configuration change that need to
>restart my hbase service, then after restart my compaction on the table is
>gone.
>And if I start major compaction on the table again, It takes again a long
>to compact the table.
>
>Is this expected behavior? I am curious what causes the major compaction to
>take a long time if nothing has changed on the table.
>
>
>I would really appreciate any help.
>
>
>Thanks
>
>Apratim


Re: How to use patch-13090 in hbase-0.98.6.1

2015-07-24 Thread Ted Yu

You can do rolling upgrade from 0.98.6.1 release to 1.1.0 release. 

Cheers

Friday, July 24, 2015, 3:50 PM +0800 from Song Geng  :
>Hi,
>
>I am a novice for hbase. Now I am trying to figure out an issue which is about 
>client scan timeout after delete. Basically, the reason is there are too many 
>continuous delete flags before major compaction.
>
>Fortunately, I found patch-13090 could fix client scan timeout issue. But we 
>use hbase-0.98.6.1, there is too much difference need to be ported.
>
>Anyone can help me. Thanks.
>
>Br, Great Soul
>soul.gr...@me.com
>
>
>
>
>


Re: Hbase major compaction question

2015-07-24 Thread apratim sharma
Hi Ted,

Please find my answers below.

*Relase of Hbase:* 1.0.0-cdh5.4.1
*Configuration Change Before Restart:* Changed Block Cache related
configuration (mainly increased off heap bucket cache size)
*Compaction Gone means:* Yes Data locality became poor after restart.

Please find log snippet pasted below while compaction was happening after
restart.
Looking at this log I guess that this has something to do with the hfile
privileges. May be it's not able to delete or modify the files during
compaction. In spite of that it reports 100% compaction. Only after restart
it has to re-do again because it failed to delete hfiles.

I will try once again after changing the file permissions and update you.

I have thousands of occurrences of below log in region server log file.
Pasting just one.


Thanks a lot for help
Apratim

1:35:55.086 PM WARN org.apache.hadoop.hbase.backup.HFileArchiver
Failed to archive class
org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://
lnxcdh03.emeter.com:8020/hbase/data/apratim/sdp/f5bbbf1ff78935dab7093517dffa44f6/m/3aff94a0594345968ac373179c629126_SeqId_6_
on try #0
org.apache.hadoop.security.AccessControlException: Permission denied:
user=hbase, access=WRITE,
inode="/hbase/data/apratim/sdp/f5bbbf1ff78935dab7093517dffa44f6/m/3aff94a0594345968ac373179c629126_SeqId_6_":aparsh:aparsh:-rw-r--r--
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6596)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6578)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6503)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimesInt(FSNamesystem.java:2209)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:2187)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1088)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.setTimes(AuthorizationProviderProxyClientProtocol.java:600)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:892)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

at sun.reflect.GeneratedConstructorAccessor35.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:2829)
at
org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1343)
at
org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1339)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1339)
at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:484)
at
org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1719)
at
org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:586)
at
org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:425)
at
org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:335)
at
org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:284)
at
org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:231)
at
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:424)
at
org.apache.hadoop.hbase.regionserver.HSto

Re[2]: Hbase major compaction question

2015-07-24 Thread Ted Yu

Please change permission for other files owned by aparsh as well. 

Cheers

Saturday, July 25, 2015, 6:13 AM +0800 from apratim sharma  
:
>Hi Ted,
>
>Please find my answers below.
>
>Relase of Hbase:  1.0.0-cdh5.4.1
>Configuration Change Before Restart: Changed Block Cache related configuration 
>(mainly increased off heap bucket cache size)
>Compaction Gone means: Yes Data locality became poor after restart.
>
>Please find log snippet pasted below while compaction was happening after 
>restart.
>Looking at this log I guess that this has something to do with the hfile 
>privileges. May be it's not able to delete or modify the files during 
>compaction. In spite of that it reports 100% compaction. Only after restart it 
>has to re-do again because it failed to delete hfiles.
>
>I will try once again after changing the file permissions and update you.
>
>I have thousands of occurrences of below log in region server log file. 
>Pasting just one.
>
>
>Thanks a lot for help
>Apratim
>
>1:35:55.086 PM WARN org.apache.hadoop.hbase.backup.HFileArchiver
>Failed to archive class 
>org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs:// 
>lnxcdh03.emeter.com:8020/hbase/data/apratim/sdp/f5bbbf1ff78935dab7093517dffa44f6/m/3aff94a0594345968ac373179c629126_SeqId_6_
> on try #0
>org.apache.hadoop.security.AccessControlException: Permission denied: 
>user=hbase, access=WRITE, 
>inode="/hbase/data/apratim/sdp/f5bbbf1ff78935dab7093517dffa44f6/m/3aff94a0594345968ac373179c629126_SeqId_6_":aparsh:aparsh:-rw-r--r--
>at 
>org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>at 
>org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>at 
>org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
>at 
>org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6596)
>at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6578)
>at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6503)
>at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimesInt(FSNamesystem.java:2209)
>at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:2187)
>at 
>org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1088)
>at 
>org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.setTimes(AuthorizationProviderProxyClientProtocol.java:600)
>at 
>org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:892)
>at 
>org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>at 
>org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>
>at sun.reflect.GeneratedConstructorAccessor35.newInstance(Unknown Source)
>at 
>sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>at 
>org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>at 
>org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:2829)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1343)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1339)
>at 
>org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1339)
>at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:484)
>at 
>org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1719)
>at 
>org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:586)
>at 
>org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:425)
>at 
>org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:335)
>at 
>org.apache.hadoop.hbase.backup.HFileArchiver.resolveA