We upgraded the lib version to 5.4.8 and though the parallel scanning exception 
issue is fixed we see a bigger problem with running multiple scanners. 
Our intended use for a single scanner was to let the snapshot scanner scan a 
given range of keys along with the region info they are located on and not have 
to scan through the whole set of regions in order to locate the keys. 
Since there is currently no way to specify the region server to scan 
specifically, running even a single scanner results in scanning through all the 
regions for the given table. When we then create multiple scanners this causes 
a massive IO load on the name node of our cluster.
Would this be a valid use case for the snapshot scanner? Can we/do we already 
have the capability to let the snapshot scanner only scan a given region rather 
than the whole table? 
Thanks,
Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
Phone: 858.652.4318 | dbho...@connexity.com






On 1/8/16, 11:53 AM, "Deepti Bhogle" <dbho...@connexity.com> wrote:

>We may not be able to upgrade to 5.4.8, since our 5.4.5 upgrade is pretty 
>recent. I will discuss the possibility if the patch doesn’t work for us. 
>Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>Phone: 858.652.4318 | dbho...@connexity.com
>
>
>
>
>
>
>On 1/8/16, 10:26 AM, "Matteo Bertozzi" <theo.berto...@gmail.com> wrote:
>
>>I think HBASE-14302 solved the problem by cutting the link creation.
>>if you are using cdh 5.4.5 as mentioned above, try to upgrade to 5.4.8+
>>
>>Matteo
>>
>>
>>On Fri, Jan 8, 2016 at 6:55 AM, Enis Söztutar <e...@apache.org> wrote:
>>
>>> Thanks Ted for the link.
>>>
>>> @Deepti, can you please do test the patch, and report back here or in jira
>>> about the findings. We should commit the patch.
>>>
>>> Enis
>>>
>>> On Thu, Jan 7, 2016 at 4:50 PM, Deepti Bhogle <dbho...@connexity.com>
>>> wrote:
>>>
>>> > Yes I think its a similar issue. The JIRA mentions a patch created that
>>> > should have the fix for the TableSnapshotScanner, we will check if we can
>>> > apply the patch to our current version.
>>> > Thanks,
>>> > Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>>> > Phone: 858.652.4318 | dbho...@connexity.com
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 1/7/16, 9:35 AM, "Ted Yu" <yuzhih...@gmail.com> wrote:
>>> >
>>> > >This seems related:
>>> > >
>>> > >HBASE-14128
>>> > >
>>> > >FYI
>>> > >
>>> > >On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle <dbho...@connexity.com>
>>> > wrote:
>>> > >
>>> > >> We currently do exactly that. We create multiple instances of
>>> > >> TableSnapshotScanner each with a unique dir location against the same
>>> > >> snapshot. But doing so gives us the exception mentioned. Does that
>>> mean
>>> > we
>>> > >> can’t run multiple instances at the same time?
>>> > >> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>>> > >> Phone: 858.652.4318 | dbho...@connexity.com
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On 1/7/16, 3:46 AM, "Enis Söztutar" <e...@apache.org> wrote:
>>> > >>
>>> > >> >TableSnapshotScanner itself does not support more than one scanner.
>>> Are
>>> > >> you
>>> > >> >creating more than 1 TableSnapshotScanner in your parallel scan?
>>> > >> >
>>> > >> >Everytime a snapshot scanner is initiated, it will try to "restore"
>>> the
>>> > >> >snapshot to a temporary location out of the regular root directory in
>>> > >> hdfs.
>>> > >> >You can try to give different restore directories to each
>>> > >> >TableSnapshotScanner.
>>> > >> >
>>> > >> >Enis
>>> > >> >
>>> > >> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle <dbho...@connexity.com>
>>> wrote:
>>> > >> >
>>> > >> >> Using the client api, we can scan the snapshot using a single
>>> > scanner.
>>> > >> We
>>> > >> >> currently create an instance of TableSnapshotScanner providing it
>>> > with a
>>> > >> >> unique dir location per scanner. We are currently on cdh 5.4.5 and
>>> > using
>>> > >> >> the
>>> > >> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we
>>> > tried
>>> > >> to
>>> > >> >> increase the number of parallel scanners but raising the no. of
>>> > scanner
>>> > >> >> instances throws the following exception, the occurrence increases
>>> as
>>> > >> the
>>> > >> >> no. of scanners goes up.
>>> > >> >> Can the client api support multiple scanners for a single snapshot?
>>> > >> >>
>>> > >> >> java.io.IOException: java.util.concurrent.ExecutionException:
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>>> > >> >> failed to create file
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> /hbase/archive/data/default/<tableName>/cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.<tableName>
>>> > >> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90
>>> because
>>> > >> >> current
>>> > >> >> leaseholder is trying to recreate file.
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>>> > >> >>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>>> > >> >>         at
>>> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> > >> >>         at
>>> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>>> > >> >>         at java.security.AccessController.doPrivileged(Native
>>> Method)
>>> > >> >>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>> > >> >>         at
>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>>> > >> >>
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:162)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:561)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:237)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:159)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:812)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.client.TableSnapshotScanner.init(TableSnapshotScanner.java:156)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.client.TableSnapshotScanner.<init>(TableSnapshotScanner.java:124)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.client.TableSnapshotScanner.<init>(TableSnapshotScanner.java:101)
>>> > >> >>         at
>>> > >> >>
>>> net.connexity.aro.data.AudienceScanner.join(AudienceScanner.scala:68)
>>> > >> >>         at
>>> > >> >> net.connexity.aro.actor.ScanActor.joinAudience(ScanActor.scala:190)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> net.connexity.aro.actor.ScanActor$$anonfun$receive$1.applyOrElse(ScanActor.scala:90)
>>> > >> >>         at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
>>> > >> >>         at
>>> > >> >> net.connexity.aro.actor.ScanActor.aroundReceive(ScanActor.scala:36)
>>> > >> >>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> > >> >>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>> > >> >>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>>> > >> >>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>>> > >> >>         at
>>> > >> >>
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> > >> >>         at
>>> > >> >>
>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> > >> >>         at
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> --
>>> > >> >> View this message in context:
>>> > >> >>
>>> > >>
>>> >
>>> http://apache-hbase.679495.n3.nabble.com/Parallel-scanning-of-snapshots-using-hbase-client-api-tp4077014.html
>>> > >> >> Sent from the HBase Developer mailing list archive at Nabble.com.
>>> > >> >>
>>> > >>
>>> >
>>>

Reply via email to