Re: Phoenix JDBC Connection Warmup

2019-01-30 Thread Jaanai Zhang
It is expected when firstly query tables after establishing the connection.
Something likes loads some meta information into local cache that need take
some time,  mainly including two aspects: 1. access SYSTEM.CATALOG table to
get schema information of the table  2. access the meta table of HBase to
get regions information of the table


   Jaanai Zhang
   Best regards!



William Shen  于2019年1月31日周四 下午1:37写道:

> Hi there,
>
> I have a component that makes Phoenix queries via the Phoenix JDBC
> Connection. I noticed that consistently, the Phoenix Client takes longer to
> execute a PreparedStatement and it takes longer to read through the
> ResultSet for a period of time (~15m) after a restart of the component. It
> seems like there is a warmup period for the JDBC connection. Is this to be
> expected?
>
> Thanks!
>


Phoenix JDBC Connection Warmup

2019-01-30 Thread William Shen
Hi there,

I have a component that makes Phoenix queries via the Phoenix JDBC
Connection. I noticed that consistently, the Phoenix Client takes longer to
execute a PreparedStatement and it takes longer to read through the
ResultSet for a period of time (~15m) after a restart of the component. It
seems like there is a warmup period for the JDBC connection. Is this to be
expected?

Thanks!


Re: client does not have phoenix.schema.isNamespaceMappingEnabled

2019-01-30 Thread Ajit Bhingarkar
user-unsubscr...@phoenix.apache.org

On Fri, Nov 30, 2018 at 12:04 AM M. Aaron Bossert 
wrote:

> So, sorry for the super late reply...there is weird lag between the time a
> message is sent or received to this mailing list and when I actually see
> it...But, I have got it working now as follows:
>
>
> HADOOP_CLASSPATH=/usr/hdp/3.0.1.0-187/hbase/lib/hbase-protocol.jar:/etc/hbase/
> 3.0.1. 0-187/0/ hadoop jar ...
>
> using this did not work:
>
> HADOOP_CLASSPATH="$(hbase mapredcp)" hadoop jar ...
>
>
> the output of that command separately is this:
>
> [user@server /somedir $] [mabossert@edge-3 lanl_data]$ hbase mapredcp
>
>
> /usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-protobuf-2.1.0.jar:/usr/hdp/3.0.1.0-187/zookeeper/zookeeper-3.4.6.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/htrace-core4-4.2.0-incubating.jar:/usr/hdp/3.0.1.0-187/hbase/lib/commons-lang3-3.6.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-server-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-protocol-shaded-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-hadoop2-compat-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-mapreduce-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-metrics-api-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/protobuf-java-2.5.0.jar:/usr/hdp/3.0.1.0-187/hbase/lib/metrics-core-3.2.1.jar:/usr/hdp/3.0.1.0-187/hbase/lib/jackson-databind-2.9.5.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-client-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-hadoop-compat-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-protocol-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-netty-2.1.0.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-shaded-miscellaneous-2.1.0.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-metrics-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-common-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/hbase-zookeeper-2.0.0.3.0.1.0-187.jar:/usr/hdp/3.0.1.0-187/hbase/lib/jackson-annotations-2.9.5.jar:/usr/hdp/3.0.1.0-187/hbase/lib/jackson-core-2.9.5.jar
>
> On Tue, Nov 27, 2018 at 4:26 PM Josh Elser  wrote:
>
>> To add a non-jar file to the classpath of a Java application, you must
>> add the directory containing that file to the classpath.
>>
>> Thus, the following is wrong:
>>
>> HADOOP_CLASSPATH=/usr/hdp/3.0.1.0-187/hbase/lib/hbase-protocol.jar:/etc/hbase/
>> 3.0.1.0-187/0/hbase-site.xml
>>
>> And should be:
>>
>> HADOOP_CLASSPATH=/usr/hdp/3.0.1.0-187/hbase/lib/hbase-protocol.jar:/etc/hbase/
>> 3.0.1.0-187/0/
>>
>> Most times, including the output of `hbase mapredcp` is sufficient ala
>>
>> HADOOP_CLASSPATH="$(hbase mapredcp)" hadoop jar ...
>>
>> On 11/27/18 10:48 AM, M. Aaron Bossert wrote:
>> > Folks,
>> >
>> > I have, I believe, followed all the directions for turning on namespace
>> > mapping as well as extra steps to (added classpath) required to use the
>> > mapreduce bulk load utility, but am still running into this error...I
>> am
>> > running a Hortonworks cluster with both HDP v 3.0.1 and HDF
>> components.
>> > Here is what I have tried:
>> >
>> >   * Checked that the proper hbase-site.xml (in my case:
>> > /etc/hbase/3.0.1.0-187/0/hbase-site.xml) file is being referenced
>> > when launching the mapreduce utility:
>> >
>> >
>> >  ...
>> >
>> >
>> > 
>> >
>> > phoenix.schema.isNamespaceMappingEnabled
>> >
>> > true
>> >
>> > 
>> >
>> > 
>> >
>> > phoenix.schema.mapSystemTablesToNamespace
>> >
>> > true
>> >
>> > 
>> >
>> >
>> >  ...
>> >
>> >   * added the appropriate classpath additions to the hadoop jar command
>> > (zookeeper quorum hostnames changed to remove my corporate network
>> > info as well as data directory):
>> >
>> >
>> HADOOP_CLASSPATH=/usr/hdp/3.0.1.0-187/hbase/lib/hbase-protocol.jar:/etc/hbase/
>> 3.0.1.0-187/0/hbase-site.xml
>> > hadoop jar
>> > /usr/hdp/3.0.1.0-187/phoenix/phoenix-5.0.0.3.0.1.0-187-client.jar
>> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table MYTABLE --input
>> > /ingest/MYCSV -z zk1,zk2,zk3 -g
>> >
>> >
>> > ...
>> >
>> >
>> > 18/11/27 15:31:48 INFO zookeeper.ReadOnlyZKClient: Close zookeeper
>> > connection 0x1d58d65f to master-1.punch.datareservoir.net:2181
>> > ,
>> master-2.punch.datareservoir.net:2181
>> > ,
>> master-3.punch.datareservoir.net:2181
>> > 
>> >
>> > 18/11/27 15:31:48 INFO log.QueryLoggerDisruptor: Shutting down
>> > QueryLoggerDisruptor..
>> >
>> > Exception in thread "main" java.sql.SQLException: ERROR 726
>> > (43M10):Inconsistent namespace mapping properties. Cannot initiate
>> > connection as SYSTEM:CATALOG is found but client does not have
>> > phoenix.schema.isNamespaceMappingEnabled enabled
>> >
>> > at
>> >
>> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
>> >
>> > at
>> >
>> 

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Josh Elser
Please do not take this advice lightly. Adding (or increasing) salt 
buckets can have a serious impact on the execution of your queries.


On 1/30/19 5:33 PM, venkata subbarayudu wrote:
You may recreate the table with salt_bucket table option to have 
reasonable regions and you may try having a secondary index to make the 
query run faster incase if your Mapreduce job performing specific filters


On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva  wrote:


If stats are enabled PhoenixInputFormat will generate a split per
guidepost.

On Wed, Jan 30, 2019 at 7:31 AM Josh Elser mailto:els...@apache.org>> wrote:

You can extend/customize the PhoenixInputFormat with your own
code to
increase the number of InputSplits and Mappers.

On 1/30/19 6:43 AM, Edwin Litterst wrote:
 > Hi,
 > I am using PhoenixInputFormat as input source for mapreduce jobs.
 > The split count (which determines how many mappers are used
for the job)
 > is always equal to the number of regions of the table from
where I
 > select the input.
 > Is there a way to increase the number of splits? My job is
running too
 > slow with only one mapper for every region.
 > (Increasing the number of regions is no option.)
 > regards,
 > Eddie



Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Ankit Singhal
As Thomas said, no. of splits will be equal to the number of guideposts
available for the table or the ones required to cover the filter.
if you are seeing one split per region then either stats are disabled or
guidePostwidth is set higher than the size of the region , so try reducing
the guidepost width and re-run the UPDATE STATISTICS to rebuild the stats ,
check after some time to confirm that's no. of guideposts has increased by
querying SYSTEM.STATS table and then run MR job.

On Wed, Jan 30, 2019 at 2:33 PM venkata subbarayudu 
wrote:

> You may recreate the table with salt_bucket table option to have
> reasonable regions and you may try having a secondary index to make the
> query run faster incase if your Mapreduce job performing specific filters
>
> On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva  wrote:
>
>> If stats are enabled PhoenixInputFormat will generate a split per
>> guidepost.
>>
>> On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:
>>
>>> You can extend/customize the PhoenixInputFormat with your own code to
>>> increase the number of InputSplits and Mappers.
>>>
>>> On 1/30/19 6:43 AM, Edwin Litterst wrote:
>>> > Hi,
>>> > I am using PhoenixInputFormat as input source for mapreduce jobs.
>>> > The split count (which determines how many mappers are used for the
>>> job)
>>> > is always equal to the number of regions of the table from where I
>>> > select the input.
>>> > Is there a way to increase the number of splits? My job is running too
>>> > slow with only one mapper for every region.
>>> > (Increasing the number of regions is no option.)
>>> > regards,
>>> > Eddie
>>>
>>


Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread venkata subbarayudu
You may recreate the table with salt_bucket table option to have reasonable
regions and you may try having a secondary index to make the query run
faster incase if your Mapreduce job performing specific filters

On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva  If stats are enabled PhoenixInputFormat will generate a split per
> guidepost.
>
> On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:
>
>> You can extend/customize the PhoenixInputFormat with your own code to
>> increase the number of InputSplits and Mappers.
>>
>> On 1/30/19 6:43 AM, Edwin Litterst wrote:
>> > Hi,
>> > I am using PhoenixInputFormat as input source for mapreduce jobs.
>> > The split count (which determines how many mappers are used for the
>> job)
>> > is always equal to the number of regions of the table from where I
>> > select the input.
>> > Is there a way to increase the number of splits? My job is running too
>> > slow with only one mapper for every region.
>> > (Increasing the number of regions is no option.)
>> > regards,
>> > Eddie
>>
>


Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Thomas D'Silva
If stats are enabled PhoenixInputFormat will generate a split per
guidepost.

On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:

> You can extend/customize the PhoenixInputFormat with your own code to
> increase the number of InputSplits and Mappers.
>
> On 1/30/19 6:43 AM, Edwin Litterst wrote:
> > Hi,
> > I am using PhoenixInputFormat as input source for mapreduce jobs.
> > The split count (which determines how many mappers are used for the job)
> > is always equal to the number of regions of the table from where I
> > select the input.
> > Is there a way to increase the number of splits? My job is running too
> > slow with only one mapper for every region.
> > (Increasing the number of regions is no option.)
> > regards,
> > Eddie
>


Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Josh Elser
You can extend/customize the PhoenixInputFormat with your own code to 
increase the number of InputSplits and Mappers.


On 1/30/19 6:43 AM, Edwin Litterst wrote:

Hi,
I am using PhoenixInputFormat as input source for mapreduce jobs.
The split count (which determines how many mappers are used for the job) 
is always equal to the number of regions of the table from where I 
select the input.
Is there a way to increase the number of splits? My job is running too 
slow with only one mapper for every region.

(Increasing the number of regions is no option.)
regards,
Eddie


split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Edwin Litterst
Hi,

 

I am using PhoenixInputFormat as input source for mapreduce jobs.

The split count (which determines how many mappers are used for the job) is always equal to the number of regions of the table from where I select the input.

Is there a way to increase the number of splits? My job is running too slow with only one mapper for every region.

(Increasing the number of regions is no option.)

 

regards,

Eddie