Re: Parallel scanning of snapshots using hbase client api

2016-03-03 Thread Deepti Bhogle
We upgraded the lib version to 5.4.8 and though the parallel scanning exception 
issue is fixed we see a bigger problem with running multiple scanners. 
Our intended use for a single scanner was to let the snapshot scanner scan a 
given range of keys along with the region info they are located on and not have 
to scan through the whole set of regions in order to locate the keys. 
Since there is currently no way to specify the region server to scan 
specifically, running even a single scanner results in scanning through all the 
regions for the given table. When we then create multiple scanners this causes 
a massive IO load on the name node of our cluster.
Would this be a valid use case for the snapshot scanner? Can we/do we already 
have the capability to let the snapshot scanner only scan a given region rather 
than the whole table? 
Thanks,
Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
Phone: 858.652.4318 | dbho...@connexity.com






On 1/8/16, 11:53 AM, "Deepti Bhogle"  wrote:

>We may not be able to upgrade to 5.4.8, since our 5.4.5 upgrade is pretty 
>recent. I will discuss the possibility if the patch doesn’t work for us. 
>Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>Phone: 858.652.4318 | dbho...@connexity.com
>
>
>
>
>
>
>On 1/8/16, 10:26 AM, "Matteo Bertozzi"  wrote:
>
>>I think HBASE-14302 solved the problem by cutting the link creation.
>>if you are using cdh 5.4.5 as mentioned above, try to upgrade to 5.4.8+
>>
>>Matteo
>>
>>
>>On Fri, Jan 8, 2016 at 6:55 AM, Enis Söztutar  wrote:
>>
>>> Thanks Ted for the link.
>>>
>>> @Deepti, can you please do test the patch, and report back here or in jira
>>> about the findings. We should commit the patch.
>>>
>>> Enis
>>>
>>> On Thu, Jan 7, 2016 at 4:50 PM, Deepti Bhogle 
>>> wrote:
>>>
>>> > Yes I think its a similar issue. The JIRA mentions a patch created that
>>> > should have the fix for the TableSnapshotScanner, we will check if we can
>>> > apply the patch to our current version.
>>> > Thanks,
>>> > Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>>> > Phone: 858.652.4318 | dbho...@connexity.com
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 1/7/16, 9:35 AM, "Ted Yu"  wrote:
>>> >
>>> > >This seems related:
>>> > >
>>> > >HBASE-14128
>>> > >
>>> > >FYI
>>> > >
>>> > >On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle 
>>> > wrote:
>>> > >
>>> > >> We currently do exactly that. We create multiple instances of
>>> > >> TableSnapshotScanner each with a unique dir location against the same
>>> > >> snapshot. But doing so gives us the exception mentioned. Does that
>>> mean
>>> > we
>>> > >> can’t run multiple instances at the same time?
>>> > >> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>>> > >> Phone: 858.652.4318 | dbho...@connexity.com
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:
>>> > >>
>>> > >> >TableSnapshotScanner itself does not support more than one scanner.
>>> Are
>>> > >> you
>>> > >> >creating more than 1 TableSnapshotScanner in your parallel scan?
>>> > >> >
>>> > >> >Everytime a snapshot scanner is initiated, it will try to "restore"
>>> the
>>> > >> >snapshot to a temporary location out of the regular root directory in
>>> > >> hdfs.
>>> > >> >You can try to give different restore directories to each
>>> > >> >TableSnapshotScanner.
>>> > >> >
>>> > >> >Enis
>>> > >> >
>>> > >> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle 
>>> wrote:
>>> > >> >
>>> > >> >> Using the client api, we can scan the snapshot using a single
>>> > scanner.
>>> > >> We
>>> > >> >> currently create an instance of TableSnapshotScanner providing it
>>> > with a
>>> > >> >> unique dir location per scanner. We are currently on cdh 5.4.5 and
>>> > using
>>> > >> >> the
>>> > >> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we
>>> > tried
>>> > >> to
>>> > >> >> increase the number of parallel scanners but raising the no. of
>>> > scanner
>>> > >> >> instances throws the following exception, the occurrence increases
>>> as
>>> > >> the
>>> > >> >> no. of scanners goes up.
>>> > >> >> Can the client api support multiple scanners for a single snapshot?
>>> > >> >>
>>> > >> >> java.io.IOException: java.util.concurrent.ExecutionException:
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>>> > >> >> failed to create file
>>> > >> >>
>>> > >> >>
>>> > >>
>>> >
>>> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
>>> > >> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90
>>> because
>>> > >> >> current
>>> > >> >> leaseholder is trying to recreate file.
>>> > >> >>

Re: Parallel scanning of snapshots using hbase client api

2016-01-08 Thread Deepti Bhogle
We may not be able to upgrade to 5.4.8, since our 5.4.5 upgrade is pretty 
recent. I will discuss the possibility if the patch doesn’t work for us. 
Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
Phone: 858.652.4318 | dbho...@connexity.com






On 1/8/16, 10:26 AM, "Matteo Bertozzi"  wrote:

>I think HBASE-14302 solved the problem by cutting the link creation.
>if you are using cdh 5.4.5 as mentioned above, try to upgrade to 5.4.8+
>
>Matteo
>
>
>On Fri, Jan 8, 2016 at 6:55 AM, Enis Söztutar  wrote:
>
>> Thanks Ted for the link.
>>
>> @Deepti, can you please do test the patch, and report back here or in jira
>> about the findings. We should commit the patch.
>>
>> Enis
>>
>> On Thu, Jan 7, 2016 at 4:50 PM, Deepti Bhogle 
>> wrote:
>>
>> > Yes I think its a similar issue. The JIRA mentions a patch created that
>> > should have the fix for the TableSnapshotScanner, we will check if we can
>> > apply the patch to our current version.
>> > Thanks,
>> > Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>> > Phone: 858.652.4318 | dbho...@connexity.com
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 1/7/16, 9:35 AM, "Ted Yu"  wrote:
>> >
>> > >This seems related:
>> > >
>> > >HBASE-14128
>> > >
>> > >FYI
>> > >
>> > >On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle 
>> > wrote:
>> > >
>> > >> We currently do exactly that. We create multiple instances of
>> > >> TableSnapshotScanner each with a unique dir location against the same
>> > >> snapshot. But doing so gives us the exception mentioned. Does that
>> mean
>> > we
>> > >> can’t run multiple instances at the same time?
>> > >> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>> > >> Phone: 858.652.4318 | dbho...@connexity.com
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:
>> > >>
>> > >> >TableSnapshotScanner itself does not support more than one scanner.
>> Are
>> > >> you
>> > >> >creating more than 1 TableSnapshotScanner in your parallel scan?
>> > >> >
>> > >> >Everytime a snapshot scanner is initiated, it will try to "restore"
>> the
>> > >> >snapshot to a temporary location out of the regular root directory in
>> > >> hdfs.
>> > >> >You can try to give different restore directories to each
>> > >> >TableSnapshotScanner.
>> > >> >
>> > >> >Enis
>> > >> >
>> > >> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle 
>> wrote:
>> > >> >
>> > >> >> Using the client api, we can scan the snapshot using a single
>> > scanner.
>> > >> We
>> > >> >> currently create an instance of TableSnapshotScanner providing it
>> > with a
>> > >> >> unique dir location per scanner. We are currently on cdh 5.4.5 and
>> > using
>> > >> >> the
>> > >> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we
>> > tried
>> > >> to
>> > >> >> increase the number of parallel scanners but raising the no. of
>> > scanner
>> > >> >> instances throws the following exception, the occurrence increases
>> as
>> > >> the
>> > >> >> no. of scanners goes up.
>> > >> >> Can the client api support multiple scanners for a single snapshot?
>> > >> >>
>> > >> >> java.io.IOException: java.util.concurrent.ExecutionException:
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>> > >> >> failed to create file
>> > >> >>
>> > >> >>
>> > >>
>> >
>> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
>> > >> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90
>> because
>> > >> >> current
>> > >> >> leaseholder is trying to recreate file.
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
>> > >> >> at
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
>> > >> >> 

Re: Parallel scanning of snapshots using hbase client api

2016-01-08 Thread Matteo Bertozzi
I think HBASE-14302 solved the problem by cutting the link creation.
if you are using cdh 5.4.5 as mentioned above, try to upgrade to 5.4.8+

Matteo


On Fri, Jan 8, 2016 at 6:55 AM, Enis Söztutar  wrote:

> Thanks Ted for the link.
>
> @Deepti, can you please do test the patch, and report back here or in jira
> about the findings. We should commit the patch.
>
> Enis
>
> On Thu, Jan 7, 2016 at 4:50 PM, Deepti Bhogle 
> wrote:
>
> > Yes I think its a similar issue. The JIRA mentions a patch created that
> > should have the fix for the TableSnapshotScanner, we will check if we can
> > apply the patch to our current version.
> > Thanks,
> > Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
> > Phone: 858.652.4318 | dbho...@connexity.com
> >
> >
> >
> >
> >
> >
> >
> > On 1/7/16, 9:35 AM, "Ted Yu"  wrote:
> >
> > >This seems related:
> > >
> > >HBASE-14128
> > >
> > >FYI
> > >
> > >On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle 
> > wrote:
> > >
> > >> We currently do exactly that. We create multiple instances of
> > >> TableSnapshotScanner each with a unique dir location against the same
> > >> snapshot. But doing so gives us the exception mentioned. Does that
> mean
> > we
> > >> can’t run multiple instances at the same time?
> > >> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
> > >> Phone: 858.652.4318 | dbho...@connexity.com
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:
> > >>
> > >> >TableSnapshotScanner itself does not support more than one scanner.
> Are
> > >> you
> > >> >creating more than 1 TableSnapshotScanner in your parallel scan?
> > >> >
> > >> >Everytime a snapshot scanner is initiated, it will try to "restore"
> the
> > >> >snapshot to a temporary location out of the regular root directory in
> > >> hdfs.
> > >> >You can try to give different restore directories to each
> > >> >TableSnapshotScanner.
> > >> >
> > >> >Enis
> > >> >
> > >> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle 
> wrote:
> > >> >
> > >> >> Using the client api, we can scan the snapshot using a single
> > scanner.
> > >> We
> > >> >> currently create an instance of TableSnapshotScanner providing it
> > with a
> > >> >> unique dir location per scanner. We are currently on cdh 5.4.5 and
> > using
> > >> >> the
> > >> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we
> > tried
> > >> to
> > >> >> increase the number of parallel scanners but raising the no. of
> > scanner
> > >> >> instances throws the following exception, the occurrence increases
> as
> > >> the
> > >> >> no. of scanners goes up.
> > >> >> Can the client api support multiple scanners for a single snapshot?
> > >> >>
> > >> >> java.io.IOException: java.util.concurrent.ExecutionException:
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> > >> >> failed to create file
> > >> >>
> > >> >>
> > >>
> >
> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
> > >> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90
> because
> > >> >> current
> > >> >> leaseholder is trying to recreate file.
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> > >> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> > >> >> at
> > 

Re: Parallel scanning of snapshots using hbase client api

2016-01-08 Thread Enis Söztutar
Thanks Ted for the link.

@Deepti, can you please do test the patch, and report back here or in jira
about the findings. We should commit the patch.

Enis

On Thu, Jan 7, 2016 at 4:50 PM, Deepti Bhogle  wrote:

> Yes I think its a similar issue. The JIRA mentions a patch created that
> should have the fix for the TableSnapshotScanner, we will check if we can
> apply the patch to our current version.
> Thanks,
> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
> Phone: 858.652.4318 | dbho...@connexity.com
>
>
>
>
>
>
>
> On 1/7/16, 9:35 AM, "Ted Yu"  wrote:
>
> >This seems related:
> >
> >HBASE-14128
> >
> >FYI
> >
> >On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle 
> wrote:
> >
> >> We currently do exactly that. We create multiple instances of
> >> TableSnapshotScanner each with a unique dir location against the same
> >> snapshot. But doing so gives us the exception mentioned. Does that mean
> we
> >> can’t run multiple instances at the same time?
> >> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
> >> Phone: 858.652.4318 | dbho...@connexity.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:
> >>
> >> >TableSnapshotScanner itself does not support more than one scanner. Are
> >> you
> >> >creating more than 1 TableSnapshotScanner in your parallel scan?
> >> >
> >> >Everytime a snapshot scanner is initiated, it will try to "restore" the
> >> >snapshot to a temporary location out of the regular root directory in
> >> hdfs.
> >> >You can try to give different restore directories to each
> >> >TableSnapshotScanner.
> >> >
> >> >Enis
> >> >
> >> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle  wrote:
> >> >
> >> >> Using the client api, we can scan the snapshot using a single
> scanner.
> >> We
> >> >> currently create an instance of TableSnapshotScanner providing it
> with a
> >> >> unique dir location per scanner. We are currently on cdh 5.4.5 and
> using
> >> >> the
> >> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we
> tried
> >> to
> >> >> increase the number of parallel scanners but raising the no. of
> scanner
> >> >> instances throws the following exception, the occurrence increases as
> >> the
> >> >> no. of scanners goes up.
> >> >> Can the client api support multiple scanners for a single snapshot?
> >> >>
> >> >> java.io.IOException: java.util.concurrent.ExecutionException:
> >> >>
> >> >>
> >>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> >> >> failed to create file
> >> >>
> >> >>
> >>
> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
> >> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90 because
> >> >> current
> >> >> leaseholder is trying to recreate file.
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> >> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >> >> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> >> >> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
> >> >> at java.security.AccessController.doPrivileged(Native Method)
> >> >> at javax.security.auth.Subject.doAs(Subject.java:415)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> >> >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
> >> >>
> >> >> at
> >> >>
> >> >>
> >>
> 

Re: Parallel scanning of snapshots using hbase client api

2016-01-07 Thread Deepti Bhogle
Yes I think its a similar issue. The JIRA mentions a patch created that should 
have the fix for the TableSnapshotScanner, we will check if we can apply the 
patch to our current version. 
Thanks,
Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
Phone: 858.652.4318 | dbho...@connexity.com







On 1/7/16, 9:35 AM, "Ted Yu"  wrote:

>This seems related:
>
>HBASE-14128
>
>FYI
>
>On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle  wrote:
>
>> We currently do exactly that. We create multiple instances of
>> TableSnapshotScanner each with a unique dir location against the same
>> snapshot. But doing so gives us the exception mentioned. Does that mean we
>> can’t run multiple instances at the same time?
>> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
>> Phone: 858.652.4318 | dbho...@connexity.com
>>
>>
>>
>>
>>
>>
>>
>> On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:
>>
>> >TableSnapshotScanner itself does not support more than one scanner. Are
>> you
>> >creating more than 1 TableSnapshotScanner in your parallel scan?
>> >
>> >Everytime a snapshot scanner is initiated, it will try to "restore" the
>> >snapshot to a temporary location out of the regular root directory in
>> hdfs.
>> >You can try to give different restore directories to each
>> >TableSnapshotScanner.
>> >
>> >Enis
>> >
>> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle  wrote:
>> >
>> >> Using the client api, we can scan the snapshot using a single scanner.
>> We
>> >> currently create an instance of TableSnapshotScanner providing it with a
>> >> unique dir location per scanner. We are currently on cdh 5.4.5 and using
>> >> the
>> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we tried
>> to
>> >> increase the number of parallel scanners but raising the no. of scanner
>> >> instances throws the following exception, the occurrence increases as
>> the
>> >> no. of scanners goes up.
>> >> Can the client api support multiple scanners for a single snapshot?
>> >>
>> >> java.io.IOException: java.util.concurrent.ExecutionException:
>> >>
>> >>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>> >> failed to create file
>> >>
>> >>
>> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
>> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90 because
>> >> current
>> >> leaseholder is trying to recreate file.
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>> >> at java.security.AccessController.doPrivileged(Native Method)
>> >> at javax.security.auth.Subject.doAs(Subject.java:415)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>> >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>> >>
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:162)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:561)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:237)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:159)
>> >> at
>> >>
>> >>
>> 

Re: Parallel scanning of snapshots using hbase client api

2016-01-07 Thread Enis Söztutar
TableSnapshotScanner itself does not support more than one scanner. Are you
creating more than 1 TableSnapshotScanner in your parallel scan?

Everytime a snapshot scanner is initiated, it will try to "restore" the
snapshot to a temporary location out of the regular root directory in hdfs.
You can try to give different restore directories to each
TableSnapshotScanner.

Enis

On Wed, Jan 6, 2016 at 5:51 PM, dbhogle  wrote:

> Using the client api, we can scan the snapshot using a single scanner. We
> currently create an instance of TableSnapshotScanner providing it with a
> unique dir location per scanner. We are currently on cdh 5.4.5 and using
> the
> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we tried to
> increase the number of parallel scanners but raising the no. of scanner
> instances throws the following exception, the occurrence increases as the
> no. of scanners goes up.
> Can the client api support multiple scanners for a single snapshot?
>
> java.io.IOException: java.util.concurrent.ExecutionException:
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> failed to create file
>
> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90 because
> current
> leaseholder is trying to recreate file.
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
> at
>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
> at
>
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
> at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
> at
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>
> at
>
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:162)
> at
>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:561)
> at
>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:237)
> at
>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:159)
> at
>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:812)
> at
>
> org.apache.hadoop.hbase.client.TableSnapshotScanner.init(TableSnapshotScanner.java:156)
> at
>
> org.apache.hadoop.hbase.client.TableSnapshotScanner.(TableSnapshotScanner.java:124)
> at
>
> org.apache.hadoop.hbase.client.TableSnapshotScanner.(TableSnapshotScanner.java:101)
> at
> net.connexity.aro.data.AudienceScanner.join(AudienceScanner.scala:68)
> at
> net.connexity.aro.actor.ScanActor.joinAudience(ScanActor.scala:190)
> at
>
> net.connexity.aro.actor.ScanActor$$anonfun$receive$1.applyOrElse(ScanActor.scala:90)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
> at
> net.connexity.aro.actor.ScanActor.aroundReceive(ScanActor.scala:36)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
> at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> 

Re: Parallel scanning of snapshots using hbase client api

2016-01-07 Thread Deepti Bhogle
We currently do exactly that. We create multiple instances of 
TableSnapshotScanner each with a unique dir location against the same snapshot. 
But doing so gives us the exception mentioned. Does that mean we can’t run 
multiple instances at the same time? 
Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
Phone: 858.652.4318 | dbho...@connexity.com







On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:

>TableSnapshotScanner itself does not support more than one scanner. Are you
>creating more than 1 TableSnapshotScanner in your parallel scan?
>
>Everytime a snapshot scanner is initiated, it will try to "restore" the
>snapshot to a temporary location out of the regular root directory in hdfs.
>You can try to give different restore directories to each
>TableSnapshotScanner.
>
>Enis
>
>On Wed, Jan 6, 2016 at 5:51 PM, dbhogle  wrote:
>
>> Using the client api, we can scan the snapshot using a single scanner. We
>> currently create an instance of TableSnapshotScanner providing it with a
>> unique dir location per scanner. We are currently on cdh 5.4.5 and using
>> the
>> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we tried to
>> increase the number of parallel scanners but raising the no. of scanner
>> instances throws the following exception, the occurrence increases as the
>> no. of scanners goes up.
>> Can the client api support multiple scanners for a single snapshot?
>>
>> java.io.IOException: java.util.concurrent.ExecutionException:
>>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>> failed to create file
>>
>> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
>> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90 because
>> current
>> leaseholder is trying to recreate file.
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
>> at
>>
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
>> at
>>
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>>
>> at
>>
>> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:162)
>> at
>>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:561)
>> at
>>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:237)
>> at
>>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:159)
>> at
>>
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:812)
>> at
>>
>> org.apache.hadoop.hbase.client.TableSnapshotScanner.init(TableSnapshotScanner.java:156)
>> at
>>
>> org.apache.hadoop.hbase.client.TableSnapshotScanner.(TableSnapshotScanner.java:124)
>> at
>>
>> org.apache.hadoop.hbase.client.TableSnapshotScanner.(TableSnapshotScanner.java:101)
>> at
>> net.connexity.aro.data.AudienceScanner.join(AudienceScanner.scala:68)
>> at
>> net.connexity.aro.actor.ScanActor.joinAudience(ScanActor.scala:190)
>> at
>>
>> net.connexity.aro.actor.ScanActor$$anonfun$receive$1.applyOrElse(ScanActor.scala:90)
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
>> at
>> net.connexity.aro.actor.ScanActor.aroundReceive(ScanActor.scala:36)
>> at 

Re: Parallel scanning of snapshots using hbase client api

2016-01-07 Thread Ted Yu
This seems related:

HBASE-14128

FYI

On Thu, Jan 7, 2016 at 9:24 AM, Deepti Bhogle  wrote:

> We currently do exactly that. We create multiple instances of
> TableSnapshotScanner each with a unique dir location against the same
> snapshot. But doing so gives us the exception mentioned. Does that mean we
> can’t run multiple instances at the same time?
> Deepti Bhogle | Senior Software Engineer | Connexity, Inc.
> Phone: 858.652.4318 | dbho...@connexity.com
>
>
>
>
>
>
>
> On 1/7/16, 3:46 AM, "Enis Söztutar"  wrote:
>
> >TableSnapshotScanner itself does not support more than one scanner. Are
> you
> >creating more than 1 TableSnapshotScanner in your parallel scan?
> >
> >Everytime a snapshot scanner is initiated, it will try to "restore" the
> >snapshot to a temporary location out of the regular root directory in
> hdfs.
> >You can try to give different restore directories to each
> >TableSnapshotScanner.
> >
> >Enis
> >
> >On Wed, Jan 6, 2016 at 5:51 PM, dbhogle  wrote:
> >
> >> Using the client api, we can scan the snapshot using a single scanner.
> We
> >> currently create an instance of TableSnapshotScanner providing it with a
> >> unique dir location per scanner. We are currently on cdh 5.4.5 and using
> >> the
> >> hbase 1.0.0-cdh5.4.5 api. In order to get desired throughput, we tried
> to
> >> increase the number of parallel scanners but raising the no. of scanner
> >> instances throws the following exception, the occurrence increases as
> the
> >> no. of scanners goes up.
> >> Can the client api support multiple scanners for a single snapshot?
> >>
> >> java.io.IOException: java.util.concurrent.ExecutionException:
> >>
> >>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> >> failed to create file
> >>
> >>
> /hbase/archive/data/default//cb794cfb7948ba8b1f4e73b690dfbfe5/L/.links-a04e1a5b2141445eb1b9e2429f1eced2/cb794cfb7948ba8b1f4e73b690dfbfe5.
> >> for DFSClient_NONMAPREDUCE_672650916_1 for client 10.10.2.90 because
> >> current
> >> leaseholder is trying to recreate file.
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3077)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2783)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2560)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:585)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:110)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
> >> at
> >>
> >>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >> at
> >>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at javax.security.auth.Subject.doAs(Subject.java:415)
> >> at
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
> >>
> >> at
> >>
> >>
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:162)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:561)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:237)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:159)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:812)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.TableSnapshotScanner.init(TableSnapshotScanner.java:156)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.TableSnapshotScanner.(TableSnapshotScanner.java:124)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.TableSnapshotScanner.(TableSnapshotScanner.java:101)
> >> at
> >>