Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-31 Thread Eddie

  
  
Maybe there are differences in default
  values on different clusters running different hadoop versions for
  USE_STATS_FOR_PARALLELIZATION.
  With hadoop 3.1.x and phoenix 5.0 useStatsForParallelization is
  true by default and the number of splits = guidepost count +
  number of regions.
  I changed GUIDE_POST_WIDTH to another value:
  ALTER TABLE  SET GUIDE_POSTS_WIDTH = 1024
  UPDATE STATISTICS  ALL
  Unfortunately this didn't change the guidepost count and also not
  the split count. Am I missing something here?




Am 30.01.2019 um 19:38 schrieb Thomas
  D'Silva:


  
  
If stats are enabled PhoenixInputFormat will
  generate a split per guidepost. 
  
  
  
On Wed, Jan 30, 2019 at 7:31
  AM Josh Elser  wrote:

You
  can extend/customize the PhoenixInputFormat with your own code
  to 
  increase the number of InputSplits and Mappers.
  
  On 1/30/19 6:43 AM, Edwin Litterst wrote:
  > Hi,
  > I am using PhoenixInputFormat as input source for
  mapreduce jobs.
  > The split count (which determines how many mappers are
  used for the job) 
  > is always equal to the number of regions of the table
  from where I 
  > select the input.
  > Is there a way to increase the number of splits? My job
  is running too 
  > slow with only one mapper for every region.
  > (Increasing the number of regions is no option.)
  > regards,
  > Eddie

  



  



Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-31 Thread Eddie
I have done something similar in another case with subclassing
DBInputFormat but I have no idea how this could be done with
PhoenixInputFormat without losing data locality (which is guaranteed as
long as one split represents one region).
Could this be achieved somehow? (the keys are salted)


Am 31.01.2019 um 00:24 schrieb Josh Elser:
> Please do not take this advice lightly. Adding (or increasing) salt
> buckets can have a serious impact on the execution of your queries.
>
> On 1/30/19 5:33 PM, venkata subbarayudu wrote:
>> You may recreate the table with salt_bucket table option to have
>> reasonable regions and you may try having a secondary index to make
>> the query run faster incase if your Mapreduce job performing specific
>> filters
>>
>> On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva >  wrote:
>>
>>     If stats are enabled PhoenixInputFormat will generate a split per
>>     guidepost.
>>
>>     On Wed, Jan 30, 2019 at 7:31 AM Josh Elser >     > wrote:
>>
>>     You can extend/customize the PhoenixInputFormat with your own
>>     code to
>>     increase the number of InputSplits and Mappers.
>>
>>     On 1/30/19 6:43 AM, Edwin Litterst wrote:
>>  > Hi,
>>  > I am using PhoenixInputFormat as input source for
>> mapreduce jobs.
>>  > The split count (which determines how many mappers are used
>>     for the job)
>>  > is always equal to the number of regions of the table from
>>     where I
>>  > select the input.
>>  > Is there a way to increase the number of splits? My job is
>>     running too
>>  > slow with only one mapper for every region.
>>  > (Increasing the number of regions is no option.)
>>  > regards,
>>  > Eddie
>>
>



Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Josh Elser
Please do not take this advice lightly. Adding (or increasing) salt 
buckets can have a serious impact on the execution of your queries.


On 1/30/19 5:33 PM, venkata subbarayudu wrote:
You may recreate the table with salt_bucket table option to have 
reasonable regions and you may try having a secondary index to make the 
query run faster incase if your Mapreduce job performing specific filters


On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva  wrote:


If stats are enabled PhoenixInputFormat will generate a split per
guidepost.

On Wed, Jan 30, 2019 at 7:31 AM Josh Elser mailto:els...@apache.org>> wrote:

You can extend/customize the PhoenixInputFormat with your own
code to
increase the number of InputSplits and Mappers.

On 1/30/19 6:43 AM, Edwin Litterst wrote:
 > Hi,
 > I am using PhoenixInputFormat as input source for mapreduce jobs.
 > The split count (which determines how many mappers are used
for the job)
 > is always equal to the number of regions of the table from
where I
 > select the input.
 > Is there a way to increase the number of splits? My job is
running too
 > slow with only one mapper for every region.
 > (Increasing the number of regions is no option.)
 > regards,
 > Eddie



Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Ankit Singhal
As Thomas said, no. of splits will be equal to the number of guideposts
available for the table or the ones required to cover the filter.
if you are seeing one split per region then either stats are disabled or
guidePostwidth is set higher than the size of the region , so try reducing
the guidepost width and re-run the UPDATE STATISTICS to rebuild the stats ,
check after some time to confirm that's no. of guideposts has increased by
querying SYSTEM.STATS table and then run MR job.

On Wed, Jan 30, 2019 at 2:33 PM venkata subbarayudu 
wrote:

> You may recreate the table with salt_bucket table option to have
> reasonable regions and you may try having a secondary index to make the
> query run faster incase if your Mapreduce job performing specific filters
>
> On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva  wrote:
>
>> If stats are enabled PhoenixInputFormat will generate a split per
>> guidepost.
>>
>> On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:
>>
>>> You can extend/customize the PhoenixInputFormat with your own code to
>>> increase the number of InputSplits and Mappers.
>>>
>>> On 1/30/19 6:43 AM, Edwin Litterst wrote:
>>> > Hi,
>>> > I am using PhoenixInputFormat as input source for mapreduce jobs.
>>> > The split count (which determines how many mappers are used for the
>>> job)
>>> > is always equal to the number of regions of the table from where I
>>> > select the input.
>>> > Is there a way to increase the number of splits? My job is running too
>>> > slow with only one mapper for every region.
>>> > (Increasing the number of regions is no option.)
>>> > regards,
>>> > Eddie
>>>
>>


Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread venkata subbarayudu
You may recreate the table with salt_bucket table option to have reasonable
regions and you may try having a secondary index to make the query run
faster incase if your Mapreduce job performing specific filters

On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva  If stats are enabled PhoenixInputFormat will generate a split per
> guidepost.
>
> On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:
>
>> You can extend/customize the PhoenixInputFormat with your own code to
>> increase the number of InputSplits and Mappers.
>>
>> On 1/30/19 6:43 AM, Edwin Litterst wrote:
>> > Hi,
>> > I am using PhoenixInputFormat as input source for mapreduce jobs.
>> > The split count (which determines how many mappers are used for the
>> job)
>> > is always equal to the number of regions of the table from where I
>> > select the input.
>> > Is there a way to increase the number of splits? My job is running too
>> > slow with only one mapper for every region.
>> > (Increasing the number of regions is no option.)
>> > regards,
>> > Eddie
>>
>


Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Thomas D'Silva
If stats are enabled PhoenixInputFormat will generate a split per
guidepost.

On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:

> You can extend/customize the PhoenixInputFormat with your own code to
> increase the number of InputSplits and Mappers.
>
> On 1/30/19 6:43 AM, Edwin Litterst wrote:
> > Hi,
> > I am using PhoenixInputFormat as input source for mapreduce jobs.
> > The split count (which determines how many mappers are used for the job)
> > is always equal to the number of regions of the table from where I
> > select the input.
> > Is there a way to increase the number of splits? My job is running too
> > slow with only one mapper for every region.
> > (Increasing the number of regions is no option.)
> > regards,
> > Eddie
>


Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Josh Elser
You can extend/customize the PhoenixInputFormat with your own code to 
increase the number of InputSplits and Mappers.


On 1/30/19 6:43 AM, Edwin Litterst wrote:

Hi,
I am using PhoenixInputFormat as input source for mapreduce jobs.
The split count (which determines how many mappers are used for the job) 
is always equal to the number of regions of the table from where I 
select the input.
Is there a way to increase the number of splits? My job is running too 
slow with only one mapper for every region.

(Increasing the number of regions is no option.)
regards,
Eddie


split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Edwin Litterst
Hi,

 

I am using PhoenixInputFormat as input source for mapreduce jobs.

The split count (which determines how many mappers are used for the job) is always equal to the number of regions of the table from where I select the input.

Is there a way to increase the number of splits? My job is running too slow with only one mapper for every region.

(Increasing the number of regions is no option.)

 

regards,

Eddie