from:"Sergey Soldatov"

Re: IllegalStateException: Phoenix driver closed because server is shutting down

2018-09-19 Thread Sergey Soldatov

That might be a misleading message. Actually, that means that JVM shutdown
has been triggered (so runtime has executed the shutdown hook for the
driver and that's the only place where we set this message) and after that,
another thread was trying to create a new connection.

Thanks,
Sergey

On Wed, Sep 19, 2018 at 11:17 AM Batyrshin Alexander <0x62...@gmail.com>
wrote:

> Version:
>
> Phoenix-4.14.0-HBase-1.4
>
> Full trace is:
>
> java.lang.IllegalStateException: Phoenix driver closed because server is
> shutting down
> at
> org.apache.phoenix.jdbc.PhoenixDriver.throwDriverClosedException(PhoenixDriver.java:290)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.checkClosed(PhoenixDriver.java:285)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:220)
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
> at java.sql.DriverManager.getConnection(DriverManager.java:270)
> at
> x.persistence.phoenix.ConnectionManager.get(ConnectionManager.scala:12)
> at
> x.persistence.phoenix.PhoenixDao.$anonfun$count$1(PhoenixDao.scala:58)
> at
> scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:12)
> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655)
> at scala.util.Success.$anonfun$map$1(Try.scala:251)
> at scala.util.Success.map(Try.scala:209)
> at scala.concurrent.Future.$anonfun$map$1(Future.scala:289)
> at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
> at
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
>
>
> > On 19 Sep 2018, at 20:13, Josh Elser  wrote:
> >
> > What version of Phoenix are you using? Is this the full stack trace you
> see that touches Phoenix (or HBase) classes?
> >
> > On 9/19/18 12:42 PM, Batyrshin Alexander wrote:
> >> Is there any reason for this exception? Which exactly server is
> shutting down if we use quorum of zookepers?
> >> java.lang.IllegalStateException: Phoenix driver closed because server
> is shutting down at
> org.apache.phoenix.jdbc.PhoenixDriver.throwDriverClosedException(PhoenixDriver.java:290)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.checkClosed(PhoenixDriver.java:285)
> at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:220) at
> java.sql.DriverManager.getConnection(DriverManager.java:664) at
> java.sql.DriverManager.getConnection(DriverManager.java:270)
>
>

Re: ABORTING region server and following HBase cluster "crash"

2018-09-15 Thread Sergey Soldatov

Obviously yes.  If it's not configured than default handlers would be used
for index writes and may lead to the distributed deadlock.

Thanks,
Sergey

On Sat, Sep 15, 2018 at 11:36 AM Batyrshin Alexander <0x62...@gmail.com>
wrote:

> I've found that we still not configured this:
>
> hbase.region.server.rpc.scheduler.factory.class
> = org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory
>
> Can this misconfiguration leads to our problems?
>
> On 15 Sep 2018, at 02:04, Sergey Soldatov 
> wrote:
>
> That was the real problem quite a long time ago (couple years?). Can't say
> for sure in which version that was fixed, but now indexes has a priority
> over regular tables and their regions open first. So by the moment when we
> replay WALs for tables, all index regions are supposed to be online. If you
> see the problem on recent versions that usually means that cluster is not
> healthy and some of the index regions stuck in RiT state.
>
> Thanks,
> Sergey
>
> On Thu, Sep 13, 2018 at 8:12 PM Jonathan Leech  wrote:
>
>> This seems similar to a failure scenario I’ve seen a couple times. I
>> believe after multiple restarts you got lucky and tables were brought up by
>> Hbase in the correct order.
>>
>> What happens is some kind of semi-catastrophic failure where 1 or more
>> region servers go down with edits that weren’t flushed, and are only in the
>> WAL. These edits belong to regions whose tables have secondary indexes.
>> Hbase wants to replay the WAL before bringing up the region server. Phoenix
>> wants to talk to the index region during this, but can’t. It fails enough
>> times then stops.
>>
>> The more region servers / tables / indexes affected, the more likely that
>> a full restart will get stuck in a classic deadlock. A good old-fashioned
>> data center outage is a great way to get started with this kind of problem.
>> You might make some progress and get stuck again, or restart number N might
>> get those index regions initialized before the main table.
>>
>> The sure fire way to recover a cluster in this condition is to
>> strategically disable all the tables that are failing to come up. You can
>> do this from the Hbase shell as long as the master is running. If I
>> remember right, it’s a pain since the disable command will hang. You might
>> need to disable a table, kill the shell, disable the next table, etc. Then
>> restart. You’ll eventually have a cluster with all the region servers
>> finally started, and a bunch of disabled regions. If you disabled index
>> tables, enable one, wait for it to become available; eg its WAL edits will
>> be replayed, then enable the associated main table and wait for it to come
>> online. If Hbase did it’s job without error, and your failure didn’t
>> include losing 4 disks at once, order will be restored. Lather, rinse,
>> repeat until everything is enabled and online.
>>
>>  A big enough failure sprinkled with a little bit of bad luck and
>> what seems to be a Phoenix flaw == deadlock trying to get HBASE to start
>> up. Fix by forcing the order that Hbase brings regions online. Finally,
>> never go full restart. 
>>
>> > On Sep 10, 2018, at 7:30 PM, Batyrshin Alexander <0x62...@gmail.com>
>> wrote:
>> >
>> > After update web interface at Master show that every region server now
>> 1.4.7 and no RITS.
>> >
>> > Cluster recovered only when we restart all regions servers 4 times...
>> >
>> >> On 11 Sep 2018, at 04:08, Josh Elser  wrote:
>> >>
>> >> Did you update the HBase jars on all RegionServers?
>> >>
>> >> Make sure that you have all of the Regions assigned (no RITs). There
>> could be a pretty simple explanation as to why the index can't be written
>> to.
>> >>
>> >>> On 9/9/18 3:46 PM, Batyrshin Alexander wrote:
>> >>> Correct me if im wrong.
>> >>> But looks like if you have A and B region server that has index and
>> primary table then possible situation like this.
>> >>> A and B under writes on table with indexes
>> >>> A - crash
>> >>> B failed on index update because A is not operating then B starting
>> aborting
>> >>> A after restart try to rebuild index from WAL but B at this time is
>> aborting then A starting aborting too
>> >>> From this moment nothing happens (0 requests to region servers) and A
>> and B is not responsible from Master-status web interface
>> >>>> On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com
>> <mailto:0x

Re: Issue with Restoration on Phoenix version 4.12

2018-09-14 Thread Sergey Soldatov

If you exported tables from 4.8 and importing them to the preexisting
tables in 4.12, make sure that you created tables using COLUMN_ENCODED_BYTES
= 0 or have phoenix.default.column.encoded.bytes.attrib set to 0 in
hbase-site.xml.
I believe that the problem you see is the column name encoding that is
enabled by default. So your previous tables have full column name as the
column qualifier, but the newer version of Phoenix expects to see the
column index there. More details about column encoding can be found at
https://phoenix.apache.org/columnencoding.html

Thanks,
Sergey

On Fri, Sep 7, 2018 at 5:51 AM Azharuddin Shaikh <
azharuddin.sha...@resilinc.com> wrote:

> Hi All,
>
> We have upgraded the phoenix version from 4.8 to 4.12 to resolve duplicate
> count issue but we are now facing issue with restoration of tables on
> phoenix version 4.12. Our Hbase version is 1.2.3
>
> We are using Hbase EXPORT/IMPORT utilities to export and import data into
> Hbase tables but when we try to check the records(Rows) using phoenix tool
> it is returning '0' records but when we run count query it returns the
> count
> of records(Rows) in the table but not able to print the content of table.
>
> Is this a bug with Phoenix version 4.12 or is there any fix for this
> restoration issue. Request you to please advice, your help would be greatly
> appreciated.
>
>
>
>
>
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Re: ABORTING region server and following HBase cluster "crash"

2018-09-14 Thread Sergey Soldatov

Forgot to mention. That kind of problems can be mitigated by increasing the
number of threads for open regions. By default, it's 3 (?), but we haven't
seen any problems with increasing it up to several hundred for clusters
that have up to 2k regions per RS.
Thanks,
Sergey

On Fri, Sep 14, 2018 at 4:04 PM Sergey Soldatov 
wrote:

> That was the real problem quite a long time ago (couple years?). Can't say
> for sure in which version that was fixed, but now indexes has a priority
> over regular tables and their regions open first. So by the moment when we
> replay WALs for tables, all index regions are supposed to be online. If you
> see the problem on recent versions that usually means that cluster is not
> healthy and some of the index regions stuck in RiT state.
>
> Thanks,
> Sergey
>
> On Thu, Sep 13, 2018 at 8:12 PM Jonathan Leech  wrote:
>
>> This seems similar to a failure scenario I’ve seen a couple times. I
>> believe after multiple restarts you got lucky and tables were brought up by
>> Hbase in the correct order.
>>
>> What happens is some kind of semi-catastrophic failure where 1 or more
>> region servers go down with edits that weren’t flushed, and are only in the
>> WAL. These edits belong to regions whose tables have secondary indexes.
>> Hbase wants to replay the WAL before bringing up the region server. Phoenix
>> wants to talk to the index region during this, but can’t. It fails enough
>> times then stops.
>>
>> The more region servers / tables / indexes affected, the more likely that
>> a full restart will get stuck in a classic deadlock. A good old-fashioned
>> data center outage is a great way to get started with this kind of problem.
>> You might make some progress and get stuck again, or restart number N might
>> get those index regions initialized before the main table.
>>
>> The sure fire way to recover a cluster in this condition is to
>> strategically disable all the tables that are failing to come up. You can
>> do this from the Hbase shell as long as the master is running. If I
>> remember right, it’s a pain since the disable command will hang. You might
>> need to disable a table, kill the shell, disable the next table, etc. Then
>> restart. You’ll eventually have a cluster with all the region servers
>> finally started, and a bunch of disabled regions. If you disabled index
>> tables, enable one, wait for it to become available; eg its WAL edits will
>> be replayed, then enable the associated main table and wait for it to come
>> online. If Hbase did it’s job without error, and your failure didn’t
>> include losing 4 disks at once, order will be restored. Lather, rinse,
>> repeat until everything is enabled and online.
>>
>>  A big enough failure sprinkled with a little bit of bad luck and
>> what seems to be a Phoenix flaw == deadlock trying to get HBASE to start
>> up. Fix by forcing the order that Hbase brings regions online. Finally,
>> never go full restart. 
>>
>> > On Sep 10, 2018, at 7:30 PM, Batyrshin Alexander <0x62...@gmail.com>
>> wrote:
>> >
>> > After update web interface at Master show that every region server now
>> 1.4.7 and no RITS.
>> >
>> > Cluster recovered only when we restart all regions servers 4 times...
>> >
>> >> On 11 Sep 2018, at 04:08, Josh Elser  wrote:
>> >>
>> >> Did you update the HBase jars on all RegionServers?
>> >>
>> >> Make sure that you have all of the Regions assigned (no RITs). There
>> could be a pretty simple explanation as to why the index can't be written
>> to.
>> >>
>> >>> On 9/9/18 3:46 PM, Batyrshin Alexander wrote:
>> >>> Correct me if im wrong.
>> >>> But looks like if you have A and B region server that has index and
>> primary table then possible situation like this.
>> >>> A and B under writes on table with indexes
>> >>> A - crash
>> >>> B failed on index update because A is not operating then B starting
>> aborting
>> >>> A after restart try to rebuild index from WAL but B at this time is
>> aborting then A starting aborting too
>> >>> From this moment nothing happens (0 requests to region servers) and A
>> and B is not responsible from Master-status web interface
>> >>>> On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com
>> <mailto:0x62...@gmail.com>> wrote:
>> >>>>
>> >>>> After update we still can't recover HBase cluster. Our region
>> servers ABORTING over and over:
>> >>>>

Re: ABORTING region server and following HBase cluster "crash"

2018-09-14 Thread Sergey Soldatov

That was the real problem quite a long time ago (couple years?). Can't say
for sure in which version that was fixed, but now indexes has a priority
over regular tables and their regions open first. So by the moment when we
replay WALs for tables, all index regions are supposed to be online. If you
see the problem on recent versions that usually means that cluster is not
healthy and some of the index regions stuck in RiT state.

Thanks,
Sergey

On Thu, Sep 13, 2018 at 8:12 PM Jonathan Leech  wrote:

> This seems similar to a failure scenario I’ve seen a couple times. I
> believe after multiple restarts you got lucky and tables were brought up by
> Hbase in the correct order.
>
> What happens is some kind of semi-catastrophic failure where 1 or more
> region servers go down with edits that weren’t flushed, and are only in the
> WAL. These edits belong to regions whose tables have secondary indexes.
> Hbase wants to replay the WAL before bringing up the region server. Phoenix
> wants to talk to the index region during this, but can’t. It fails enough
> times then stops.
>
> The more region servers / tables / indexes affected, the more likely that
> a full restart will get stuck in a classic deadlock. A good old-fashioned
> data center outage is a great way to get started with this kind of problem.
> You might make some progress and get stuck again, or restart number N might
> get those index regions initialized before the main table.
>
> The sure fire way to recover a cluster in this condition is to
> strategically disable all the tables that are failing to come up. You can
> do this from the Hbase shell as long as the master is running. If I
> remember right, it’s a pain since the disable command will hang. You might
> need to disable a table, kill the shell, disable the next table, etc. Then
> restart. You’ll eventually have a cluster with all the region servers
> finally started, and a bunch of disabled regions. If you disabled index
> tables, enable one, wait for it to become available; eg its WAL edits will
> be replayed, then enable the associated main table and wait for it to come
> online. If Hbase did it’s job without error, and your failure didn’t
> include losing 4 disks at once, order will be restored. Lather, rinse,
> repeat until everything is enabled and online.
>
>  A big enough failure sprinkled with a little bit of bad luck and
> what seems to be a Phoenix flaw == deadlock trying to get HBASE to start
> up. Fix by forcing the order that Hbase brings regions online. Finally,
> never go full restart. 
>
> > On Sep 10, 2018, at 7:30 PM, Batyrshin Alexander <0x62...@gmail.com>
> wrote:
> >
> > After update web interface at Master show that every region server now
> 1.4.7 and no RITS.
> >
> > Cluster recovered only when we restart all regions servers 4 times...
> >
> >> On 11 Sep 2018, at 04:08, Josh Elser  wrote:
> >>
> >> Did you update the HBase jars on all RegionServers?
> >>
> >> Make sure that you have all of the Regions assigned (no RITs). There
> could be a pretty simple explanation as to why the index can't be written
> to.
> >>
> >>> On 9/9/18 3:46 PM, Batyrshin Alexander wrote:
> >>> Correct me if im wrong.
> >>> But looks like if you have A and B region server that has index and
> primary table then possible situation like this.
> >>> A and B under writes on table with indexes
> >>> A - crash
> >>> B failed on index update because A is not operating then B starting
> aborting
> >>> A after restart try to rebuild index from WAL but B at this time is
> aborting then A starting aborting too
> >>> From this moment nothing happens (0 requests to region servers) and A
> and B is not responsible from Master-status web interface
>  On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com
> > wrote:
> 
>  After update we still can't recover HBase cluster. Our region servers
> ABORTING over and over:
> 
>  prod003:
>  Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=92,queue=2,port=60020]
> regionserver.HRegionServer: ABORTING region server
> prod003,60020,1536446665703: Could not update the index table, killing
> server region because couldn't write to an index table
>  Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=77,queue=7,port=60020]
> regionserver.HRegionServer: ABORTING region server
> prod003,60020,1536446665703: Could not update the index table, killing
> server region because couldn't write to an index table
>  Sep 09 02:52:19 prod003 hbase[1440]: 2018-09-09 02:52:19,224 FATAL
> [RpcServer.default.FPBQ.Fifo.handler=82,queue=2,port=60020]
> regionserver.HRegionServer: ABORTING region server
> prod003,60020,1536446665703: Could not update the index table, killing
> server region because couldn't write to an index table
>  Sep 09 02:52:28 prod003 hbase[1440]: 2018-09-09 02:52:28,922 FATAL
>

Re: Salting based on partial rowkeys

2018-09-14 Thread Sergey Soldatov

Thomas is absolutely right that there will be a possibility of hotspotting.
Salting is the mechanism that should prevent that in all cases (because all
rowids are different). The partitioning described above actually can be
implemented by using id2 as a first column of the PK and using presplit by
values without salting. The only difference will be that in the suggested
approach we don't need to know the values range for that particular
column(s). If we want to implement that (well, I remember several cases
when people asked how to presplit the table without information about the
range of values for PK columns to improve bulk load to a new table without
performance lost for some queries due salting) it would be better to
separate it from salting and call it 'partitioning' or something like
that.

Thanks,
Sergey

On Thu, Sep 13, 2018 at 10:09 PM Thomas D'Silva 
wrote:

> For the usage example that you provided when you write data how does the
> values of id_1, id_2 and other_key vary?
> I assume id_1 and id_2 remain the same while other_key is monotonically
> increasing, and thats why the table is salted.
> If you create the salt bucket only on id_2 then wouldn't you run into
> region server hotspotting during writes?
>
> On Thu, Sep 13, 2018 at 8:02 PM, Jaanai Zhang 
> wrote:
>
>> Sorry, I don't understander your purpose. According to your proposal, it
>> seems that can't achieve.  You need a hash partition, However,  Some things
>> need to clarify that HBase is a range partition engine and the salt buckets
>> were used to avoid hotspot, in other words, HBase as a storage engine can't
>> support hash partition.
>>
>> 
>>Jaanai Zhang
>>Best regards!
>>
>>
>>
>> Gerald Sangudi  于2018年9月13日周四 下午11:32写道：
>>
>>> Hi folks,
>>>
>>> Any thoughts or feedback on this?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi 
>>> wrote:
>>>
 Hello folks,

 We have a requirement for salting based on partial, rather than full,
 rowkeys. My colleague Mike Polcari has identified the requirement and
 proposed an approach.

 I found an already-open JIRA ticket for the same issue:
 https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
 details from the proposal.

 The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
 proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .

 The benefit at issue is that users gain more control over partitioning,
 and this can be used to push some additional aggregations and hash joins
 down to region servers.

 I would appreciate any go-ahead / thoughts / guidance / objections /
 feedback. I'd like to be sure that the concept at least is not
 objectionable. We would like to work on this and submit a patch down the
 road. I'll also add a note to the JIRA ticket.

 Thanks,
 Gerald

>>>
>

Re: SKIP_SCAN on variable length keys

2018-09-04 Thread Sergey Soldatov

SKIP SCAN doesn't use FuzzyRowFilter. It has its own SkipScanFilter. If you
see problems, please provide more details or file a JIRA for that.

Thanks,
Sergey

On Wed, Aug 29, 2018 at 2:17 PM Batyrshin Alexander <0x62...@gmail.com>
wrote:

>  Hello,
> Im wondering is there any issue with SKIP SCAN when variable length
> columns used in composite key?
> My suspicion comes from FuzzyRowFilter that takes fuzzy row key template
> with fixed positions

Re: TTL on a single column family in table

2018-09-04 Thread Sergey Soldatov

What is the use case to set TTL only for a single column family? I would
say that making TTL table wide is a mostly technical decision because in
relational databases we operate with rows and supporting TTL for only some
columns sounds a bit strange.

Thanks,
Sergey

On Fri, Aug 31, 2018 at 7:43 AM Domen Kren  wrote:

> Hello,
>
> we have situation where we would like to set TTL on a single column family
> in a table. After getting errors while trying to do that trough a phoenix
> command i found this issue,
> https://issues.apache.org/jira/browse/PHOENIX-1409, where it said "TTL -
> James Taylor and I discussed offline and we decided that for now we will
> only be supporting for all column families to have the same TTL as the
> empty column family. This means we error out if a column family is
> specified while setting TTL property - both at CREATE TABLE and ALTER TABLE
> time. Also changes were made to make sure that any new column family added
> gets the same TTL as the empty CF."
>
> If i understand correctly, this was a design decision and not a technical
> one. So my question is, if i change this configuration trough HBase API or
> console, could there be potential problems that arise in phoenix?
>
> Thanks you and best regards,
> Domen Kren
>
>
>

Re: Phoenix CsvBulkLoadTool fails with java.sql.SQLException: ERROR 103 (08004): Unable to establish connection

2018-08-20 Thread Sergey Soldatov

If I read it correctly you are trying to use Phoenix and HBase that were
built against Hadoop 2 with Hadoop 3. Is HBase was the only component you
have upgraded?

Thanks,
Sergey

On Mon, Aug 20, 2018 at 1:42 PM Mich Talebzadeh 
wrote:

> Here you go
>
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:java.library.path=/home/hduser/hadoop-3.1.0/lib
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:java.compiler=
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:os.version=3.10.0-862.3.2.el7.x86_64
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:user.name=hduser
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:user.home=/home/hduser
> 2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
> environment:user.dir=/data6/hduser/streaming_data/2018-08-20
> 2018-08-20 18:29:47,249 INFO  [main] zookeeper.ZooKeeper: Initiating
> client connection, connectString=rhes75:2181 sessionTimeout=9
> watcher=hconnection-0x493d44230x0, quorum=rhes75:2181, baseZNode=/hbase
> 2018-08-20 18:29:47,261 INFO  [main-SendThread(rhes75:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server rhes75/
> 50.140.197.220:2181. Will not attempt to authenticate using SASL (unknown
> error)
> 2018-08-20 18:29:47,264 INFO  [main-SendThread(rhes75:2181)]
> zookeeper.ClientCnxn: Socket connection established to rhes75/
> 50.140.197.220:2181, initiating session
> 2018-08-20 18:29:47,281 INFO  [main-SendThread(rhes75:2181)]
> zookeeper.ClientCnxn: Session establishment complete on server rhes75/
> 50.140.197.220:2181, sessionid = 0x1002ea99eed0077, negotiated timeout =
> 4
> Exception in thread "main" java.sql.SQLException: ERROR 103 (08004):
> Unable to establish connection.
> at
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:455)
> at
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:386)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:222)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2318)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2294)
> at
> org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2294)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:232)
> at
> org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:147)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
> at java.sql.DriverManager.getConnection(DriverManager.java:208)
> at
> org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:340)
> at
> org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:332)
> at
> org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:209)
> at
> org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at
> org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
> at
> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
> at
> org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:340)
> at
>

Re: Atomic UPSERT on indexed tables

2018-06-04 Thread Sergey Soldatov

Yes, the documentation doesn't reflect the recent changes. Please see
https://issues.apache.org/jira/browse/PHOENIX-3925

Thanks,
Sergey

On Fri, Jun 1, 2018 at 5:39 PM, Miles Spielberg  wrote:

> From https://phoenix.apache.org/atomic_upsert.html:
>
> Although global indexes on columns being atomically updated are supported,
>> it’s not recommended as a potentially a separate RPC across the wire would
>> be made while the row is under lock to maintain the secondary index.
>
>
> But the parser on 4.13.1 doesn't seem to agree:
>
> 0: jdbc:phoenix:thin:url=http://192.168.99.10> CREATE TABLE T1 (A VARCHAR
> PRIMARY KEY, B VARCHAR);
>
> No rows affected (1.299 seconds)
>
> 0: jdbc:phoenix:thin:url=http://192.168.99.10> CREATE INDEX T1_B ON T1(B);
>
> No rows affected (6.285 seconds)
>
> 0: jdbc:phoenix:thin:url=http://192.168.99.10> UPSERT INTO T1(A,B)
> VALUES('hello', 'world') ON DUPLICATE KEY IGNORE;
>
> Error: Error -1 (0) : Error while executing SQL "UPSERT INTO T1(A,B)
> VALUES('hello', 'world') ON DUPLICATE KEY IGNORE": Remote driver error:
> RuntimeException: java.sql.SQLException: ERROR 1224 (42Z24): The ON
> DUPLICATE KEY clause may not be used when a table has a global index.
> tableName=T1 -> SQLException: ERROR 1224 (42Z24): The ON DUPLICATE KEY
> clause may not be used when a table has a global index. tableName=T1
> (state=0,code=-1)
>
> 0: jdbc:phoenix:thin:url=http://192.168.99.10>
>
> Am I doing something wrong here? Is the documentation inaccurate?
>
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102
> 900 Jefferson Ave
> 
> Redwood City, CA 94063
> 
>

Re: Phoenix Client threads

2018-05-22 Thread Sergey Soldatov

Salting byte is calculated using a hash function for the whole row key
(using all pk columns). So if you are using only one of PK columns in the
WHERE clause, Phoenix is unable to identify which salting byte (bucket
number) should be used, so it runs scans for all salting bytes.  All those
threads are lightweight, mostly waiting for a response from HBase server,
so you may consider the option to adjust nproc limit. Or you may decrease
the number of phoenix threads by phoenix.query.threadPoolSize property.
Decreasing number of salting buckets can be used as well.

Thanks,
Sergey

On Tue, May 22, 2018 at 8:52 AM, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi,
>
>
>
> We have table with key as (type, id1, id2) (type is same for all rows
> where as id1 and id2 are unique for each row) which is salted (30 salt
> buckets)
> The load on this table is about 30 queries/sec with each query taking ~6ms
> we are using phoenix 4.7.0 non-thin client
> we have query like below
>
>
> SELECT tab.a, tab.b
>
> FROM tab
>
> WHERE tab.id1 = '1F64F5DY0J0A03692'
>
> AND tab.type = 4
>
> AND tab.isActive = 1;
>
>
>
> CLIENT 30-CHUNK 0 ROWS 0 BYTES PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER
> TAB [0,4, '1F64F5DY0J0A03692']
>
> SERVER FILTER BY TS.ISACTIVE = 1
>
>
>
> Here I could see that about 30 threads are being used for this query..here
> ‘type’ is same for all rows..and thought that it is the reason for looking
> into all the chunks to get the key and hence using 30 threads
>
>
>
> Then I ran the same query on a similar table with keys rearranged (id1,
> id2, type) and salted (30)
>
>
>
> But still I see same 30 threads are being used , thought it can uniquely
> identify a row with given id1 which should be in one of the chunks (is this
> due to salting that it does not know where the keys is)
>
>
>
> CLIENT 30-CHUNK PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0,
> '1F64F5DY0J0A03692']
>
> SERVER FILTER BY (TYPE = 4 AND TS.ISACTIVE = 1)
>
>
>
> Currently I am exceeding my nproc limit set in my app server with (phoenix
> threads 128 and hconnection threads reaching 256 = 384 threads). Can you
> please throw some light on phoenix connections and Hconnections  and how to
> reduce that to reasonable level..and also on the above query plans. Should
> we consider reducing the SALT Number to 10( we have 10 region servers)?
>
>
>
> Thanks,
>
> Pradheep
>

Re: 答复: phoenix query server java.lang.ClassCastException for BIGINT ARRAY column

2018-04-19 Thread Sergey Soldatov

Definitely, someone who is maintaining CDH branch should take a look. I
don't observer that behavior on the master branch:

0: jdbc:phoenix:thin:url=http://localhost:876> create table if not exists
testarray(id bigint not null, events bigint array constraint pk primary key
(id));
No rows affected (2.4 seconds)
0: jdbc:phoenix:thin:url=http://localhost:876> upsert into testarray values
(1, array[1,2]);
1 row affected (0.056 seconds)
0: jdbc:phoenix:thin:url=http://localhost:876> select * from testarray;
+-+-+
| ID  | EVENTS  |
+-+-+
| 1   | [1, 2]  |
+-+-+
1 row selected (0.068 seconds)
0: jdbc:phoenix:thin:url=http://localhost:876>


Thanks,
Sergey

On Thu, Apr 19, 2018 at 12:57 PM, Lu Wei <wey...@outlook.com> wrote:

> by the way, all the queries are shot in sqlline-thin.py
>
>
>
> --
> *发件人:* Lu Wei
> *发送时间:* 2018年4月19日 6:51:15
> *收件人:* user@phoenix.apache.org
> *主题:* 答复: phoenix query server java.lang.ClassCastException for BIGINT
> ARRAY column
>
>
> ## Version:
> phoenix: 4.13.2-cdh5.11.2
> hive: 1.1.0-cdh5.11.2
>
> to reproduce:
>
> -- create table
>
> create table if not exists testarray(id bigint not null, events bigint
> array constraint pk primary key (id))
>
>
> -- upsert data:
>
> upsert into testarray values (1, array[1,2]);
>
>
> -- query:
>
> select id from testarray;   -- fine
>
> select * from testarray;    -- error
> --
> *发件人:* sergey.solda...@gmail.com <sergey.solda...@gmail.com> 代表 Sergey
> Soldatov <sergeysolda...@gmail.com>
> *发送时间:* 2018年4月19日 6:37:06
> *收件人:* user@phoenix.apache.org
> *主题:* Re: phoenix query server java.lang.ClassCastException for BIGINT
> ARRAY column
>
> Could you please be more specific? Which version of phoenix are you using?
> Do you have a small script to reproduce? At first glance it looks like a
> PQS bug.
>
> Thanks,
> Sergey
>
> On Thu, Apr 19, 2018 at 8:17 AM, Lu Wei <wey...@outlook.com> wrote:
>
> Hi there,
>
> I have a phoenix table containing an BIGINT ARRAY column. But when
> querying query server (through sqlline-thin.py), there is an exception:
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long
>
> BTW, when query through sqlline.py, everything works fine. And data in
> HBase table are of Long type, so why does the Integer to Long cast happen?
>
>
> ## Table schema:
>
> create table if not exists gis_tracking3(tracking_object_id bigint not
> null, lat double, lon double, speed double, bearing double, time timestamp
> not null, events bigint array constraint pk primary key
> (tracking_object_id, time))
>
>
> ## when query events[1], it works fine:
>
> 0: jdbc:phoenix:thin:url=http://10.10.13.87:8> select  events[1]+1 from
> gis_tracking3;
> +--+
> | (ARRAY_ELEM(EVENTS, 1) + 1)  |
> +--+
> | 11   |
> | 2223 |
> | null |
> | null |
> | 10001|
> +--+
>
>
> ## when querying events, it throws exception:
>
> 0: jdbc:phoenix:thin:url=http://10.10.13.87:8> select  events from
> gis_tracking3;
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.Ab
> stractCursor$LongAccessor.getLong(AbstractCursor.java:550)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.Ab
> stractCursor$ArrayAccessor.convertValue(AbstractCursor.java:1310)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.Ab
> stractCursor$ArrayAccessor.getObject(AbstractCursor.java:1289)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.Ab
> stractCursor$ArrayAccessor.getArray(AbstractCursor.java:1342)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.Ab
> stractCursor$ArrayAccessor.getString(AbstractCursor.java:1354)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.Avatica
> ResultSet.getString(AvaticaResultSet.java:257)
>   at sqlline.Rows$Row.(Rows.java:183)
>   at sqlline.BufferedRows.(BufferedRows.java:38)
>   at sqlline.SqlLine.print(SqlLine.java:1660)
>   at sqlline.Commands.execute(Commands.java:833)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:813)
>   at sqlline.SqlLine.begin(SqlLine.java:686)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:291)
>   at org.apache.phoenix.queryserver.client.SqllineWrapper.main(Sq
> llineWrapper.java:93)
>
>
> I guess there is some issue in query sever, but can't figure out why.
>
> Any suggestions?
>
>
>
> Thanks,
>
> Wei
>
>
>

Re: hint to use a global index is not working - need to find out why

2018-04-19 Thread Sergey Soldatov

That looks strange. Could you please provide full DDLs for table and
indexes? I just tried a similar scenario and obviously index is used:

0: jdbc:phoenix:> create table VARIANTJOIN_RTSALTED24 (id integer primary
key, chrom_int integer, genomic_range integer);
No rows affected (6.339 seconds)
0: jdbc:phoenix:>create index jv2_chrom_int on VARIANTJOIN_RTSALTED24
(chrom_int);
No rows affected (10.016 seconds)
0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */
VJ.chrom_int, genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE
(chrom_int =18 ) limit 5;
+---+
| PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER
VARIANTJOIN_RTSALTED24   |
| CLIENT 5 ROW LIMIT
|
| SKIP-SCAN-JOIN TABLE 0
|
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER
JV2_CHROM_INT [18]  |
| SERVER FILTER BY FIRST KEY ONLY
 |
| DYNAMIC SERVER FILTER BY "VJ.ID" IN ($2.$4)
 |
| JOIN-SCANNER 5 ROW LIMIT
|
+---+
7 rows selected (0.936 seconds)


Thanks,
Sergey

On Thu, Apr 19, 2018 at 7:31 PM, Taylor, Ronald (Ronald) <
ronald.tay...@cchmc.org> wrote:

> Hello Phoenix users,
>
> I am a novice Phoenix user and this is my first post to this user list. I
> did some searching in the list archives, but could not find an answer to
> what I hope is a simple question: my global index is being ignored, even
> after I add a Hint, and I want to know why.
>
> We are using Phoenix 4.7 in the Hortonworks distribution. Looks like
> Hortonworks has been backporting at least some phoenix updates into their
> version of phoenix 4.7, so I guess it is a custom distribution. See
>
>
>
>  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/
> bk_release-notes/content/patch_phoenix.html
>
>
>
> I have created a simple table of about 8 million rows, and about 15
> columns, with several fields having global indexes. I created the main
> table (variantjoin_rtsalted24) and its indexes, and then used a bulk loader
> to populate them from a tab-delimited file. That appeared to work fine.
>
> chrom_int is one field on which there is a global index, named
> vj2_chrom_int. And you can see the index being automatically being used
> below, where it is the only field being returned. Time required is 0.124
> sec.
>
> 0: jdbc:phoenix:> SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as VJ
> WHERE (chrom_int =18 ) limit 5;
>
> ++
>
> | CHROM_INT  |
>
> ++
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> ++
>
> 5 rows selected (0.124 seconds)
>
> 0: jdbc:phoenix:>
>
> You can see that the vj2_chrom_int index is automatically being used, as I
> understand things  by the "RANGE SCAN" wording and "[0,1" in the explain
> plan:
>
> 0: jdbc:phoenix:> explain SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24
> as VJ WHERE (chrom_int =18 ) limit 5;
>
> +---
> ---+
>
> |   PLAN
> |
>
> +---
> ---+
>
> | CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT
> [0,1 |
>
> | SERVER FILTER BY FIRST KEY ONLY
> |
>
> | SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5 ROW LIMIT
>   |
>
> +---
> ---+
>
> 4 rows selected (0.043 seconds)
>
> 0: jdbc:phoenix:>
>
>
> I can use a Hint to tell Phoenix to NOT use this index, as seen below. And
> that increases the time needed to 1.97 sec, over an order of magnitude more
> time than the 0.124 sec required with index use.
>
> 0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> ++
>
> | CHROM_INT  |
>
> ++
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> ++
>
> 5 rows selected (1.977 seconds)
>
> 0: jdbc:phoenix:>
>
> And here is the explain plan for that:
>
>
> 0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> +---
> ---+
>
> |
> PLAN  |
>
> +---
> ---+
>
> | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND
> ROBIN  |
>
> | SERVER FILTER BY CHROM_INT = 18
> |
>
> | SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5

Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread Sergey Soldatov

Heh. That looks like a bug actually. This is a 'dummy' KV (
https://phoenix.apache.org/faq.html#Why_empty_key_value), but I have some
doubts that we need it for compacted rows.

Thanks,
Sergey

On Thu, Apr 19, 2018 at 11:30 PM, Lew Jackman <lew9...@netzero.net> wrote:

> I have not tried the master yet branch yet, however on Phoenix 4.13 this
> storage discrepancy in hbase is still present with the extra
> column=M:\x00\x00\x00\x00 cells in hbase when using psql or sqlline.
>
> Does anyone have an understanding of the meaning of the column qualifier
> \x00\x00\x00\x00 ?
>
>
> -- Original Message --
> From: "Lew Jackman" <lew9...@netzero.net>
> To: user@phoenix.apache.org
> Cc: user@phoenix.apache.org
> Subject: Re: hbase cell storage different bewteen bulk load and direct api
> Date: Thu, 19 Apr 2018 13:59:16 GMT
>
> The upsert statement appears the same as the psql results - i.e. extra
> cells. I will try the master branch next. Thanks for the tip.
>
> -- Original Message --
> From: Sergey Soldatov <sergeysolda...@gmail.com>
> To: user@phoenix.apache.org
> Subject: Re: hbase cell storage different bewteen bulk load and direct api
> Date: Thu, 19 Apr 2018 12:26:25 +0600
>
> Hi Lew,
> no. 1st one looks line incorrect. You may file a bug on that ( I believe
> that the second case is correct, but you may also check with uploading data
> using regular upserts). Also, you may check whether the master branch has
> this issue.
>
> Thanks,
> Sergey
>
> On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net> wrote:
>
>> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
>> between a load via psql and a bulk load.
>>
>> To illustrate in a simple case we have modified the example table from
>> the load reference https://phoenix.apache.org/bulk_dataload.html
>>
>> CREATE TABLE example (
>> Â Â Â my_pk bigint not null,
>> Â Â Â m.first_name varchar(50),
>> Â Â Â m.last_name varchar(50)
>> Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
>> Â Â Â IMMUTABLE_ROWS=true,
>> Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>> Â Â Â COLUMN_ENCODED_BYTES = 1;
>>
>> Hbase Rows when Loading via PSQL
>>
>> x80x00x00x00x00x0009
>> Â Â Â Â column=M:x00x00x00x00,
>> timestamp=1524109827690, value=x Â Â Â Â Â Â Â Â Â Â Â Â Â
>> x80x00x00x00x00x0009
>> Â Â Â Â column=M:1, timestamp=1524109827690, value=xJohnDoex00\
>> \\\x00x00x01x00x05
>> x00x00x00x08x00x00\\
>> \\x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>> x80x00x00x00x00x01x092
>> Â column=M:x00x00x00x00,
>> timestamp=1524109827690, value=x Â Â Â Â Â Â Â Â Â Â Â Â Â
>> x80x00x00x00x00x01x092
>> Â column=M:1, timestamp=1524109827690, value=xMaryPoppinsx00\
>> \\\x00x00x01x00x05\
>> \\\x00x00x00x0Cx00
>> x00x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>
>> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>>
>> x80x00x00x00x00x0009
>> Â Â Â Â column=M:1, timestamp=1524110486638, value=xJohnDoex00\
>> \\\x00x00x01x00x05
>> x00x00x00x08x00x00\\
>> \\x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>> x80x00x00x00x00x01x092
>> Â column=M:1, timestamp=1524110486638, value=xMaryPoppinsx00\
>> \\\x00x00x01x00x05\
>> \\\x00x00x00x0Cx00
>> x00x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>
>>
>> So, the bulk loaded tables have 4 cells for the two rows loaded via psql
>> whereas a bulk load is missing two cells since it lacks the cells with col
>> qualifierÂ :x00x00x00x00
>> Â
>> Is this behavior correct?
>> Â
>> Thanks much for any insight.
>> Â
>>
>>
>> 
>> *How To "Remove" Dark Spots*
>> Gundry MD
>> <http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc>
>> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc
>> [image: SponsoredBy Content.Ad]
>
>

Re: phoenix query server java.lang.ClassCastException for BIGINT ARRAY column

2018-04-19 Thread Sergey Soldatov

Could you please be more specific? Which version of phoenix are you using?
Do you have a small script to reproduce? At first glance it looks like a
PQS bug.

Thanks,
Sergey

On Thu, Apr 19, 2018 at 8:17 AM, Lu Wei  wrote:

> Hi there,
>
> I have a phoenix table containing an BIGINT ARRAY column. But when
> querying query server (through sqlline-thin.py), there is an exception:
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long
>
> BTW, when query through sqlline.py, everything works fine. And data in
> HBase table are of Long type, so why does the Integer to Long cast happen?
>
>
> ## Table schema:
>
> create table if not exists gis_tracking3(tracking_object_id bigint not
> null, lat double, lon double, speed double, bearing double, time timestamp
> not null, events bigint array constraint pk primary key
> (tracking_object_id, time))
>
>
> ## when query events[1], it works fine:
>
> 0: jdbc:phoenix:thin:url=http://10.10.13.87:8> select  events[1]+1 from
> gis_tracking3;
> +--+
> | (ARRAY_ELEM(EVENTS, 1) + 1)  |
> +--+
> | 11   |
> | 2223 |
> | null |
> | null |
> | 10001|
> +--+
>
>
> ## when querying events, it throws exception:
>
> 0: jdbc:phoenix:thin:url=http://10.10.13.87:8> select  events from
> gis_tracking3;
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.
> AbstractCursor$LongAccessor.getLong(AbstractCursor.java:550)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.
> AbstractCursor$ArrayAccessor.convertValue(AbstractCursor.java:1310)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.
> AbstractCursor$ArrayAccessor.getObject(AbstractCursor.java:1289)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.
> AbstractCursor$ArrayAccessor.getArray(AbstractCursor.java:1342)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.util.
> AbstractCursor$ArrayAccessor.getString(AbstractCursor.java:1354)
>   at org.apache.phoenix.shaded.org.apache.calcite.avatica.
> AvaticaResultSet.getString(AvaticaResultSet.java:257)
>   at sqlline.Rows$Row.(Rows.java:183)
>   at sqlline.BufferedRows.(BufferedRows.java:38)
>   at sqlline.SqlLine.print(SqlLine.java:1660)
>   at sqlline.Commands.execute(Commands.java:833)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:813)
>   at sqlline.SqlLine.begin(SqlLine.java:686)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:291)
>   at org.apache.phoenix.queryserver.client.SqllineWrapper.main(
> SqllineWrapper.java:93)
>
>
> I guess there is some issue in query sever, but can't figure out why.
>
> Any suggestions?
>
>
>
> Thanks,
>
> Wei
>

Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread Sergey Soldatov

Hi Lew,
no. 1st one looks line incorrect. You may file a bug on that ( I believe
that the second case is correct, but you may also check with uploading data
using regular upserts). Also, you may check whether the master branch has
this issue.

Thanks,
Sergey

On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman  wrote:

> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
> between a load via psql and a bulk load.
>
> To illustrate in a simple case we have modified the example table from the
> load reference https://phoenix.apache.org/bulk_dataload.html
>
> CREATE TABLE example (
>my_pk bigint not null,
>m.first_name varchar(50),
>m.last_name varchar(50)
>CONSTRAINT pk PRIMARY KEY (my_pk))
>IMMUTABLE_ROWS=true,
>IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>COLUMN_ENCODED_BYTES = 1;
>
> Hbase Rows when Loading via PSQL
>
> \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:\\x00\\x00\\x00\\x00,
> timestamp=1524109827690, value=x
> \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:1, timestamp=1524109827690,
> value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
>
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:\\x00\\x00\\x00\\x00,
> timestamp=1524109827690, value=x
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1,
> timestamp=1524109827690, value=xMaryPoppins\\x00\\x00\\
> x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
>
>
> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>
> \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:1, timestamp=1524110486638,
> value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
>
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1,
> timestamp=1524110486638, value=xMaryPoppins\\x00\\x00\\
> x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
>
>
>
> So, the bulk loaded tables have 4 cells for the two rows loaded via psql
> whereas a bulk load is missing two cells since it lacks the cells with col
> qualifier :\\x00\\x00\\x00\\x00
>
> Is this behavior correct?
>
> Thanks much for any insight.
>
>
>
> 
> *How To "Remove" Dark Spots*
> Gundry MD
> 
> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc
> [image: SponsoredBy Content.Ad]

Re: intermittent problem to query simple table

2018-04-06 Thread Sergey Soldatov

There is the JIRA on this topic :

   1. PHOENIX-4366 
   2.
   3. Thanks,
   4. Sergey


On Thu, Apr 5, 2018 at 11:42 AM, Xu, Nan  wrote:

> Hi,
>
>
>
>   Env: hbase-1.1.4
>
>   Phoenix: 4.10
>
>I am querying a very simple table in phoenix
>
>
>
>CREATE TABLE IF NOT EXISTS BCM.PATH ( PATH VARCHAR NOT NULL, IS_BRANCH
> BOOLEAN CONSTRAINT BCM_path_pk PRIMARY KEY (PATH)) COMPRESSION='SNAPPY',
> DATA_BLOCK_ENCODING='FAST_DIFF', VERSIONS=1000, KEEP_DELETED_CELLS=true "
>
>
>
>   And run this query
>
>
>
>Select count(1) from bcm.path
>
>
>
> Some time, not always, it give me error.   This one is a small table with
> about 100K records.  And with nothing changed. It sometimes give right
> result.
>
>
>
> org.apache.phoenix.exception.PhoenixIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException:
> BCM:PATH,,1520015853032.88b3078dd61c0aaab1692fdc5d561dc2.: null
>
> at org.apache.phoenix.util.ServerUtil.createIOException(
> ServerUtil.java:89)
>
> at org.apache.phoenix.util.ServerUtil.throwIOException(
> ServerUtil.java:55)
>
> at org.apache.phoenix.coprocessor.
> BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(
> BaseScannerRegionObserver.java:256)
>
> at org.apache.phoenix.coprocessor.
> BaseScannerRegionObserver$RegionScannerHolder.nextRaw(
> BaseScannerRegionObserver.java:282)
>
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> scan(RSRpcServices.java:2448)
>
> at org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>
> at org.apache.hadoop.hbase.ipc.
> RpcServer.call(RpcServer.java:2117)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:104)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:133)
>
> at org.apache.hadoop.hbase.ipc.
> RpcExecutor$1.run(RpcExecutor.java:108)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.UnsupportedOperationException
>
> at org.apache.phoenix.schema.PTable$
> QualifierEncodingScheme$1.decode(PTable.java:243)
>
> at org.apache.phoenix.schema.tuple.
> EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList
> .java:136)
>
> at org.apache.phoenix.schema.tuple.
> EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:55)
>
> at org.apache.hadoop.hbase.regionserver.StoreScanner.
> next(StoreScanner.java:573)
>
> at org.apache.hadoop.hbase.regionserver.KeyValueHeap.
> next(KeyValueHeap.java:147)
>
> at org.apache.hadoop.hbase.regionserver.HRegion$
> RegionScannerImpl.populateResult(HRegion.java:5516)
>
> at org.apache.hadoop.hbase.regionserver.HRegion$
> RegionScannerImpl.nextInternal(HRegion.java:5667)
>
> at org.apache.hadoop.hbase.regionserver.HRegion$
> RegionScannerImpl.nextRaw(HRegion.java:5454)
>
> at org.apache.hadoop.hbase.regionserver.HRegion$
> RegionScannerImpl.nextRaw(HRegion.java:5440)
>
> at org.apache.phoenix.coprocessor.
> UngroupedAggregateRegionObserver.doPostScannerOpen(
> UngroupedAggregateRegionObserver.java:497)
>
> at org.apache.phoenix.coprocessor.
> BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(
> BaseScannerRegionObserver.java:237)
>
>
>
>   any hints? I don’t see any log from region server looks related.
>
>
>
> Thanks,
>
> Nan
> --
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
> recipient, please delete this message.
>

Re: SALT_BUCKETS and Writing

2018-04-04 Thread Sergey Soldatov

Salt is calculated basing on the hash of the row key and number of buckets
on phoenix client side. So it's preferable to use phoenix to write. If you
have to use HBase API because of any reason, you need to perform the same
calculations (copy/paste the code from Phoenix)

Thanks,
Sergey

On Wed, Apr 4, 2018 at 7:56 AM, Noe Detore  wrote:

> New to Phoenix, and looking to use "create table SALT_BUCKETS=". The
> documentation states "Phoenix provides a way to transparently salt the row
> key with a salting byte". Is this still done when writing to the table via
> hbase api or must write using jdbc via phoenix-client.jar?
>

Re: Timestamp retreiving in Phoenix

2018-03-29 Thread Sergey Soldatov

The general answer is no. In some cases, row timestamp feature
https://phoenix.apache.org/rowtimestamp.html may be useful. But still, you
should have timestamp column in your table DDL in that case.

Thanks,
Sergey

On Thu, Mar 29, 2018 at 1:14 AM, alexander.scherba...@yandex.com <
alexander.scherba...@yandex.com> wrote:

> Hello,
>
> Link [1] describes how to set a timestamp during connection creation in
> Phoenix.
>
> Is it possible to use SQL query to get the timestamp value or made some
> filtering?
>
> Thanks,
> Alexander.
>
>
> [1] https://phoenix.apache.org/faq.html#Can_phoenix_work_on_
> tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
>
>

Re: A strange issue about missing data

2018-03-29 Thread Sergey Soldatov

Usually such kind of problems may happen when something wrong with the
statistic. You may try to clean SYSTEM.STATS, restart the client and check
whether it fixes the problem. If not, you may try to turn on DEBUG log
level on the client and check whether generated scans are covering all
regions (this task is easier to perform with clean stats).

Thanks,
Sergey


On Thu, Mar 29, 2018 at 9:15 AM, Xiaofeng Wang  wrote:

> Hi, recently I encounter a strange issue about missing data. For instance,
> column A, B and C is the primary keys in Phoenix and I can get all the
> relevant records via the condition A=‘aaa'. But some time later(maybe one
> hour or more), I get the less records than before with the same condition.
> I don't know why :(
>
> BTW: The Phoenix table is salted with bucket 13
>
> This is my environment:
>
> Phoenix: 4.8.0
> HBase: 1.2.0
>
> hbase-site.xml:
>
> 
>  
>hbase.regionserver.thrift.http
>true
>  
>  
>hbase.thrift.support.proxyuser
>true
>  
>  
>hbase.rootdir
>hdfs://ns1/hbase
>  
>  
>hbase.master.port
>6
>  
>  
>hbase.master.info.port
>60010
>  
>  
>hbase.regionserver.port
>60020
>  
>  
>hbase.regionserver.info.port
>60030
>  
>  
>hbase.cluster.distributed
>true
>  
>  
>hbase.zookeeper.quorum
>host1:2181,host2:2181,host3:2181
>  
>  
>zookeeper.recovery.retry
>3
>  
>  
>zookeeper.session.timeout
>6
>  
>  
>hbase.ipc.server.callqueue.handler.factor
>0.2
>  
>  
>hbase.ipc.server.callqueue.read.ratio
>0.3
>  
>  
>hbase.ipc.server.callqueue.scan.ratio
>0.4
>  
>  
>hbase.client.max.perserver.tasks
>7
>  
>  
>hbase.client.max.perregion.tasks
>4
>  
>  
>hbase.offpeak.start.hour
>2
>  
>  
>hbase.offpeak.end.hour
>4
>  
>  
>hbase.storescanner.parallel.seek.enable
>true
>  
>  
>hbase.storescanner.parallel.seek.threads
>10
>  
>  
>hbase.client.retries.number
>5
>  
>  
>hbase.client.scanner.caching
>1000
>  
>  
>hbase.regionserver.handler.count
>100
>  
>  
>hfile.block.cache.size
>0.4
>  
>  
>hbase.coprocessor.abortonerror
>false
>  
>  
>hbase.regionserver.thrift.framed
>false
>  
>  
>hbase.column.max.version
>3
>  
>  
>hbase.status.published
>false
>  
>  
>hbase.status.multicast.address.port
>61000
>  
>  
>hbase.lease.recovery.dfs.timeout
>230
>  
>  
>dfs.client.sockettimeout
>100
>  
>  
>hbase.master.distributed.log.replay
>true
>  
>  
>hbase.rpc.engine
>org.apache.hadoop.hbase.ipc.SecureRpcEngine
>  
>  
>hbase.coprocessor.master.classes
> org.apache.hadoop.hbase.security.access.AccessController
>  
>  
>hbase.coprocessor.region.classes
> org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.
> hadoop.hbase.security.access.AccessController
>  
>  
>hbase.superuser
>hbase
>  
>  
>hbase.security.authorization
>false
>  
>  
>hbase.client.write.buffer
>5242880
>  
>  
>hbase.hregion.max.filesize
>10737418240
>  
>  
>hbase.rpc.timeout
>100
>  
>  
>hbase.regionserver.wal.codec
> org.apache.hadoop.hbase.regionserver.wal.
> IndexedWALEditCodec
> 
>  
>hbase.client.keyvalue.maxsize
>20971520
>  
>  
>hbase.hregion.memstore.flush.size
>536870912
>  
>  
>hbase.hregion.memstore.block.multiplier
>8
>  
>  
>hbase.hstore.blockingStoreFiles
>300
>  
>  
>hbase.hstore.compactionThreshold
>10
>  
>  
>hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily
>320
>  
> 
>
> Thank you all!
>

Re: Incorrect number of rows affected from DELETE query

2018-02-22 Thread Sergey Soldatov

Hi Jins,
If you provide steps to reproduce it would be much easier to understand
where the problem is. If nothing was deleted the report should be 'No
rows affected'.

Thanks,
Sergey

On Mon, Feb 19, 2018 at 4:30 PM, Jins George  wrote:

> Hi,
>
> I am facing an issue in which the number of rows affected by a DELETE
> query returns an incorrect value.   The record I am trying to delete does
> not exists in the table, as evident from the first query but on deletion,
> it reports 1 row is affected.  Is this a known issue?
>
> I have tried this in Phoenix 4.7 & Phoenix 4.13 and both behaves the same
> way.
>
>
> 0: jdbc:phoenix:localhost> select accountId, subid  from test.mytable
> where accountid = '1' and subid = '1';
> +++
> | ACCOUNTID  | SUBID  |
> +++
> +++
> *No rows selected (0.017 seconds)*
> 0: jdbc:phoenix:localhost> delete from test.mytable where accountid = '1'
> and subid = '1';
> *1 row affected (0.005 seconds)*
> 0: jdbc:phoenix:localhost>
>
>
> Thanks,
> Jins George
>

Re: Changing number of salt buckets for a table

2018-02-15 Thread Sergey Soldatov

Well, there is no easy way to resalt the table. The main problem that when
salting byte is calculated, the number of buckets is used. So if we want to
change the number of buckets, all rowkeys should be rewritten. I think that
you still can use MR job for that, but I would recommend to write data to
hfiles instead of using upserts. How it can be implemented you may find in
CSV bulkload tool sources.

Thanks,
Sergey

On Thu, Feb 15, 2018 at 11:25 AM, Marcell Ortutay 
wrote:

> I have a phoenix table that is about 7 TB (unreplicated) in size,
> corresponding to about 500B rows. It was set up a couple years ago, and we
> have determined that the number of salt buckets it has is not optimal for
> the current query pattern we are seeing. I want to change the number of
> salt buckets as I expect it will improve performance.
>
> I have written a MapReduce job that does this using a subclass of
> TableMapper. It scans the entire old table, and writes the re-salted data
> to a new table. The MapReduce job works on small tables, but I'm having
> trouble getting it to run on the larger table.
>
> I have two questions for anyone who has experience with this:
>
> (1) Are there any publicly available MapReduce jobs for re-salting a
> Phoenix table?
>
> (2) Generally, is there a better approach than MapReduce to re-salt a
> Phoenix table?
>
> Thanks,
> Marcell Ortutay
>
>

Re: High CPU usage on Hbase region Server with GlobalMemoryManager warnings

2018-02-01 Thread Sergey Soldatov

that kind of messages may happen when there were queries that utilize
memory manager (usually joins and group by) and they were timed out or
failed due to some reason. So the message itself is hardly related to CPU
usage or GC.
BUT. That may mean that your region servers are unable to handle properly
such kind of workload.
Since you say that this issue started after Yarn work I would suggest
checking swappiness and huge pages (there are quite a lot of resources over
the Internet how they affect HBase). It might be the case when you just run
out of HW resources.

Thanks,
Sergey

On Wed, Jan 31, 2018 at 6:40 PM, Jins George  wrote:

> Hi,
>
> On analyzing a prod issue of High CPU usage on Hbase Region server, I came
> across warning messages from region server logs complaining about Orphaned
> chunk of memory.
>
> 2018-01-30 19:16:31,565 WARN org.apache.phoenix.memory.GlobalMemoryManager: 
> Orphaned chunk of 104000 bytes found during finalize
> 2018-01-30 19:16:31,565 WARN org.apache.phoenix.memory.GlobalMemoryManager: 
> Orphaned chunk of 104000 bytes found during finalize
>
>
> The high CPU usage looks like due to garbage collection and it lasted for
> almost 6 hours.  And throughout 6 hours, region server logs had these
> warning messages logged.
>
> Cluster Details:
> 4 node( 1 master + 3 slaves)  cdh cluster
> Hbase version 1.2
> Phoenix version 4.7
> Region Server Heap : 4G
> Total Regions: ~135
> Total tables : ~35
>
> Out of 3 region servers, 2 of them had the warning logs and both suffered
> high CPU. Third region server nither had High CPU nor the warning logs. Any
> idea why these messages are logged and can that trigger continuous GC ?
>
> Before this issue started( or around the same time) huge application log
> files were copied to HDFS by Yarn.. But can't think of that causing issue
> on Hbase Region  server.
>
> Any help is appreciated.
>
> Thanks,
> Jins George
>

Re: Why the bulkload does not support update index data?

2018-01-22 Thread Sergey Soldatov

What do you mean by 'bulkload can not update index data'? During the
bulkload MR job creates hfiles for the table and all corresponding indexes
and uses the regular HBase bulkload to load them. Have you had a problem
during HBase bulkload for index generate hfile?

Thanks,
Sergey

On Fri, Jan 19, 2018 at 4:19 AM, venkata subbarayudu 
wrote:

> is phoenix bulk loader not updating the index all the times, or , are
> during any specific scenarios
>
> On Thu, Jan 18, 2018 at 7:51 AM, cloud.pos...@gmail.com <
> cloud.pos...@gmail.com> wrote:
>
>> The index data will be dirty if bulkload can not update index data.
>>
>
>
>
> --
> *Venkata Subbarayudu Amanchi.*
>

Re: Reading Phoenix Upserted data directly from Hbase

2017-12-01 Thread Sergey Soldatov

HBase doesn't know about data types that you are using in Phoenix. So it
operates with binary arrays. HBase shell shows printable ASCII characters
as is and hex values for the rest. You may use spark-phoenix module to work
with Phoenix from Spark.

Thanks,
Sergey

On Thu, Nov 30, 2017 at 11:22 PM, Vaghawan Ojha 
wrote:

> Hi,
>
> I've few phoenix tables created from Phoenix itself, they do fine with the
> Phoenix, however, when I try to scan the data from hbase shell, the binary
> strings get printed instead of the real values, like the one I can see in
> the phoenix.
>
> Also, there are cases when I want to fetch them directly from hbase and
> work with spark. I guess I'm doing something wrong with the configuration
> of phoenix, or is this the expected result?
>
> I'm using phoenix-4.12.0-HBase-1.2 .
>
> Any reply would be appreciated.
>
> Thanks
>

Re: Bulk loading into table vs view

2017-11-28 Thread Sergey Soldatov

Please take a look at https://phoenix.apache.org/views.html
All views are 'virtual' tables, so they don't have a dedicated physical
table and operates on top of the table that is specified in the view DDL.

Thanks,
Sergey

On Sat, Nov 25, 2017 at 6:25 AM, Eisenhut, Roman 
wrote:

> Dear Phoenix-Team,
>
>
>
> I did some test on bulk-loading data with the psql.py script in
> $PHOENIX_HOME/bin and the tpc-h data on my cluster with 1 master and 3 RS.
> I’ve found that it makes quite a difference whether you:
>
>1. Create a table
>2. Bulk load data into that table
>
> Or
>
>1. Create a table
>2. Create a view
>3. Bulk load data in the view
>
>
>
> I was wondering where the overhead is coming from? (you can find my
> numbers below)
>
>
>
> Additionally, I created a view over a table which was already filled and
> phoenix returned “No rows affected”. At the same time I can’t find a table
> in HBase that reflects the view, which makes me wonder whether views are
> actually materialized somewhere. As I’m quite interested in the view
> functionality of Phoenix, I was wondering whether someone can explain what
> is happening when a view is created?
>
>
>
> Best regards,
>
> Roman
>
>
>
>
>
> *psql.py -t X -d '|' X.csv, where X = table name*
>
> *ID*
>
> *TABLE*
>
> *region*
>
> *nation*
>
> *supplier*
>
> *customer*
>
> *part*
>
> *partsupp*
>
> *orders*
>
> *lineitem*
>
> *5*
>
> *25.00*
>
> *10,000*
>
> *150,000*
>
> *200,000*
>
> *800,000*
>
> *1,500,000*
>
> *6,001,215*
>
> *1*
>
> 0.068
>
> 0.11
>
> 2.959
>
> 18.789
>
> 27.881
>
> 107.03
>
> 164.853
>
> 1007.315
>
> *2*
>
> 0.124
>
> 0.093
>
> 2.993
>
> 19.62
>
> 26.954
>
> 80.671
>
> 169.038
>
> 1039.294
>
> *3*
>
> 0.07
>
> 0.092
>
> 2.795
>
> 20.745
>
> 29.036
>
> 76.855
>
> 177.765
>
> 1042.642
>
> *4*
>
> 0.132
>
> 0.101
>
> 2.89
>
> 20.527
>
> 28.121
>
> 78.956
>
> 180.145
>
> 1019.047
>
> *5*
>
> 0.072
>
> 0.116
>
> 3.334
>
> 27.494
>
> 28.891
>
> 75.455
>
> 166.668
>
> 1011.299
>
> *MIN*
>
> 0.068
>
> 0.092
>
> 2.795
>
> 18.789
>
> 26.954
>
> 75.455
>
> 164.853
>
> 1007.315
>
> *MAX*
>
> 0.132
>
> 0.116
>
> 3.334
>
> 27.494
>
> 29.036
>
> 107.03
>
> 180.145
>
> 1042.642
>
> *AVG*
>
> 0.0932
>
> 0.1024
>
> 2.9942
>
> 21.435
>
> 28.1766
>
> 83.7934
>
> 171.6938
>
> 1023.919
>
>
>
> *psql.py -t X_VIEW -d '|' X.csv, where X = table name*
>
> *ID*
>
> *VIEW*
>
> *region*
>
> *nation*
>
> *supplier*
>
> *customer*
>
> *part*
>
> *partsupp*
>
> *orders*
>
> *lineitem*
>
> *5*
>
> *25.00*
>
> *10,000*
>
> *150,000*
>
> *200,000*
>
> *800,000*
>
> *1,500,000*
>
> *6,001,215*
>
> *1*
>
> 0.103
>
> 0.159
>
> 2.644
>
> 22.702
>
> 28.424
>
> 93.897
>
> 201.449
>
>
>
> *2*
>
> 0.097
>
> 0.138
>
> 2.641
>
> 20.926
>
> 32.014
>
> 95.195
>
> 190.939
>
>
>
> *3*
>
> 0.123
>
> 0.076
>
> 3.097
>
> 19.88
>
> 38.426
>
> 90.613
>
> 193.376
>
>
>
> *4*
>
> 0.092
>
> 0.098
>
> 3.14
>
> 23.522
>
> 29.509
>
> 99.443
>
> 192.348
>
>
>
> *5*
>
> 0.089
>
> 0.146
>
> 2.938
>
> 22.196
>
> 34.407
>
> 93.898
>
> 198.012
>
>
>
> *MIN*
>
> 0.089
>
> 0.076
>
> 2.641
>
> 19.88
>
> 28.424
>
> 90.613
>
> 190.939
>
> 0
>
> *MAX*
>
> 0.123
>
> 0.159
>
> 3.14
>
> 23.522
>
> 38.426
>
> 99.443
>
> 201.449
>
> 0
>
> *AVG*
>
> 0.1008
>
> 0.1234
>
> 2.892
>
> 21.8452
>
> 32.556
>
> 94.6092
>
> 195.2248
>
> #DIV/0!
>
>
>

Re: Pool size / queue size with thin client

2017-11-14 Thread Sergey Soldatov

Make sure that you have restarted PQS as well and it has the updated
hbase-site.xml in the classpath.

Thanks,
Sergey

On Tue, Nov 14, 2017 at 6:53 AM, Stepan Migunov <
stepan.migu...@firstlinesoftware.com> wrote:

> Hi,
>
> Could you please suggest how I can change pool size / queue size when
> using thin client? I have added to hbase-site.xml the following options:
>
> 
> phoenix.query.threadPoolSize
> 2000
> 
>
> 
> phoenix.query.queueSize
> 10
> 
>
> restarted hbase (master and regions), but still receive the following
> response (via JDBC-thin client):
>
> Remote driver error: RuntimeException: 
> org.apache.phoenix.exception.PhoenixIOException:
> Task org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask@69529e2
> rejected from org.apache.phoenix.job.JobManager$1@48b8311c[Running, pool
> size = 128, active threads = 128, queued tasks = 5000, completed tasks = 0]
>
> My guess that settings are not applied and default values (128/5000) still
> used.
> What's wrong?
>
> Thanks,
> Stepan.
>
>
>

Re: Transfer all data to a new phoenix cluster

2017-11-14 Thread Sergey Soldatov

You may use the standard procedures to copy HBase tables across the
clusters (copyTable, snapshot based copy. There is a lot of articles on
this topic). And execution time depends on the network speed. For 1Gbit it
would take something close to 5-6 hours.  If you don't want any downtime
and plans continuously ingest the data you may consider replication as the
way to copy data, but I'm not sure about the speed.

Thanks,
Sergey

On Mon, Nov 13, 2017 at 6:27 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> Is there a standard way to move all apache phoenix data to a new cluster?
> and how long would it take to move 2 terabytes of phoenix rows?
>
> senario:
>
> I launched my platform using virtual servers (as its cheaper) but now I am
> ready to move to dedicated servers but I want to know the right way to move
> the data to a new cluster so my users dont hunt me down to torture me for
> messing up the data they trusted my platform with.
>
> Regards,
> Cheyenne
>

Re: Tuning MutationState size

2017-11-09 Thread Sergey Soldatov

Could you provide the version you are using? Do you have autocommit turned
on and have you changed the following properties:
phoenix.mutate.batchSize
phoenix.mutate.maxSize
phoenix.mutate.maxSizeBytes

Thanks,
Sergey

If you are using more recent version, than you may consider to
On Thu, Nov 9, 2017 at 5:41 AM, Marcin Januszkiewicz <
januszkiewicz.mar...@gmail.com> wrote:

> I was trying to create a global index table but it failed out with:
>
> Error: ERROR 730 (LIM02): MutationState size is bigger than maximum
> allowed number of bytes (state=LIM02,code=730)
> java.sql.SQLException: ERROR 730 (LIM02): MutationState size is bigger
> than maximum allowed number of bytes
> at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.
> newException(SQLExceptionCode.java:489)
> at org.apache.phoenix.exception.SQLExceptionInfo.buildException(
> SQLExceptionInfo.java:150)
> at org.apache.phoenix.execute.MutationState.throwIfTooBig(
> MutationState.java:359)
> at org.apache.phoenix.execute.MutationState.join(
> MutationState.java:447)
> at org.apache.phoenix.compile.MutatingParallelIteratorFactor
> y$1.close(MutatingParallelIteratorFactory.java:98)
> at org.apache.phoenix.iterate.RoundRobinResultIterator$
> RoundRobinIterator.close(RoundRobinResultIterator.java:298)
> at org.apache.phoenix.iterate.RoundRobinResultIterator.next(
> RoundRobinResultIterator.java:105)
> at org.apache.phoenix.compile.UpsertCompiler$2.execute(
> UpsertCompiler.java:821)
> at org.apache.phoenix.compile.DelegateMutationPlan.execute(
> DelegateMutationPlan.java:31)
> at org.apache.phoenix.compile.PostIndexDDLCompiler$1.
> execute(PostIndexDDLCompiler.java:117)
> at org.apache.phoenix.query.ConnectionQueryServicesImpl.
> updateData(ConnectionQueryServicesImpl.java:3360)
> at org.apache.phoenix.schema.MetaDataClient.buildIndex(
> MetaDataClient.java:1283)
> at org.apache.phoenix.schema.MetaDataClient.createIndex(
> MetaDataClient.java:1595)
> at org.apache.phoenix.compile.CreateIndexCompiler$1.execute(
> CreateIndexCompiler.java:85)
> at org.apache.phoenix.jdbc.PhoenixStatement$2.call(
> PhoenixStatement.java:394)
> at org.apache.phoenix.jdbc.PhoenixStatement$2.call(
> PhoenixStatement.java:377)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(
> PhoenixStatement.java:376)
> at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(
> PhoenixStatement.java:364)
> at org.apache.phoenix.jdbc.PhoenixStatement.execute(
> PhoenixStatement.java:1738)
> at sqlline.Commands.execute(Commands.java:822)
> at sqlline.Commands.sql(Commands.java:732)
> at sqlline.SqlLine.dispatch(SqlLine.java:813)
> at sqlline.SqlLine.begin(SqlLine.java:686)
> at sqlline.SqlLine.start(SqlLine.java:398)
> at sqlline.SqlLine.main(SqlLine.java:291)
>
> Is there a way to predict what max size will be sufficient, or which other
> knobs to turn?
>
>
> --
> Pozdrawiam,
> Marcin Januszkiewicz
>

Re: SELECT + ORDER BY vs self-join

2017-10-31 Thread Sergey Soldatov

I agree with James that this happens because the index was not involved
because it doesn't cover all columns. I believe that in the second case,
the RHT  is using the index to create a list of rowkeys and they are used
for point lookups by skipscan.

bq. When is using the self-join a worse choice than the simple select?

Hash join has it's own limitations:
1.  RHT is supposed to be small, so it's better to keep LIMIT small (far
less than 30 mil).
2.  Client is always involved to collect data from RHT, build the hash join
cache and send it to all RSes.

bq. Is there a better way to construct this query?

Using local index may help in this case.

Thanks,
Sergey

On Mon, Oct 30, 2017 at 11:26 PM, James Taylor 
wrote:

> Please file a JIRA and include the explain plan for each of the queries. I
> suspect your index is not being used in the first query due to the
> selection of all the columns. You can try hinting the query to force your
> index to be used. See https://phoenix.apache.org/secondary_indexing.html#
> Index_Usage
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 7:02 AM, Marcin Januszkiewicz <
> januszkiewicz.mar...@gmail.com> wrote:
>
>> We have a wide table with 100M records created with the following DDL:
>>
>> CREATE TABLE traces (
>>   rowkey VARCHAR PRIMARY KEY,
>>   time VARCHAR,
>>   number VARCHAR,
>>   +40 more columns)
>>
>> We want to select a large (~30M records) subset of this data with the
>> query:
>>
>> SELECT *all columns*
>>   FROM traces
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101;
>>
>> This times out after 15 minutes and puts a huge load on our cluster.
>> We have an alternate way of selecting this data:
>>
>> SELECT t.rowkey, *all columns*
>> FROM TRACES t
>> JOIN (
>>   SELECT rowkey
>>   FROM TRACES
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101
>> ) ix
>> ON t.ROWKEY = ix.ROWKEY
>> order by t.ROWKEY;
>>
>> Which completes in just under a minute.
>> Is there a better way to construct this query?
>> When is using the self-join a worse choice than the simple select?
>> Given that we have a functional index on UPPER(number), could this
>> potentially be a statistics-based optimizer decision?
>>
>> --
>> Pozdrawiam,
>> Marcin Januszkiewicz
>>
>
>

Re: Querying table with index fails in some configurations.

2017-10-30 Thread Sergey Soldatov

It's reproducible on my box which is maybe several days behind the master
branch, so feel free to file a JIRA.

Thanks,
Sergey

Re: Cloudera parcel update

2017-10-25 Thread Sergey Soldatov

Hi Flavio,

It looks like you need to ask the vendor, not the community about their
plan for further releases.

Thanks,
Sergey

On Wed, Oct 25, 2017 at 2:21 PM, Flavio Pompermaier 
wrote:

> Hi to all,
> the latest Phoenix Cloudera parcel I can see is 4.7...any plan to release
> a newer version?
>
> I'd need at least Phoenix 4.9..anyone using it?
>
> Best,
> Flavio
>

Re: Performance of Inserting HBASE Phoenix table via Hive

2017-10-09 Thread Sergey Soldatov

You need to remember, that inserting into Phoenix from Hive is going
through an additional layer (StorageHandler) which is not optimized like
ORC or other Hive specific formats. So you may expect that it will be
visible slower than regular Hive table and very slow comparing to the
regular Phoenix upserts (dozen times or even more). If you need to
duplicate a lot of information from Hive to Phoenix, CSV bulkload is the
best way to perform such operations.

Thanks,
Sergey

On Tue, Oct 10, 2017 at 4:31 AM, sudhir patil 
wrote:

>
>
> What are the performance implications of Inserting HBASE Phoenix table via
> Hive? Any good practices around it? How is the performance compared to jdbc
> insert or phoenix csv upload?
>
> Any pointers would be of great help.
>
>
>

Re: SQLline and binary columns

2017-09-27 Thread Sergey Soldatov

Please check https://phoenix.apache.org/language/functions.html for
functions that work with binary data. Like GET_BIT, GET_BYTE.

Thanks,
Sergey

On Wed, Sep 27, 2017 at 2:46 PM, Jon Strayer 
wrote:

> Is there a way to query a table based on a binary column (16 bytes)?
>
>
>
> —
>
> *Jon Strayer *l Sr. Software Engineer
>
> Proofpoint, Inc.
> M: 317-440-7938 <(317)%20440-7938>
> E: jstra...@proofpoint.com
>
> [image: id:19D710FA-EA49-4073-88E9-CA6D6C2BD349]
> 
>
> threat protection l compliance l archiving & governance l secure
> communication
>

Re: Race condition around first/last value or group by?

2017-09-19 Thread Sergey Soldatov

Sounds like a bug. If you have a reproducible case (DDLs + sample data +
query), please file a JIRA. Actually the number of cores should not affect
the query execution. The behavior you described means that KV pair for the
projection was incorrectly built and that may happen on the server side.

Thanks,
Sergey

On Tue, Sep 19, 2017 at 7:52 AM, Jon Strayer 
wrote:

> I have a SQL query that looks in part like this:
>
>
>
>First_value(si.ip_addr)
>
>  within GROUP (ORDER BY si.source DESC) AS srcIp,
>
>First_value(si.port)
>
>  within GROUP (ORDER BY si.source DESC) AS srcPort,
>
>Last_value(si.ip_addr)
>
>  within GROUP (ORDER BY si.source DESC) AS destIp,
>
>Last_value(si.port)
>
>  within GROUP (ORDER BY si.source DESC) AS destPort
>
>
>
> Sometimes it returns the port values in the ip column and the ip values in
> the port column.  If I delete the group by and the first/last sections it
> seems to work fine.
>
>
>
> If I run the query with 2-7 cores it runs fine (1 and 8 fail).
>
>
>
> The part about the number of cores effecting the bug makes me pretty sure
> it’s a race condition (along with the fact that printing out values changes
> the bug too).  I’ve looked at the client code and haven’t seen anything
> obvious.  And it just occurred to me that I suppose it could be the server
> code too.
>
>
>
> Has anyone else seen anything like this?
>
>
>
> Version numbers:
>
> sqlline version 1.1.8
>
> 4.7.0-clabs-phoenix1.3.0.14
>
>
>
>
>
> —
>
> *Jon Strayer *l Sr. Software Engineer
>
> Proofpoint, Inc.
> M: 317-440-7938 <(317)%20440-7938>
> E: jstra...@proofpoint.com
>
> [image: id:19D710FA-EA49-4073-88E9-CA6D6C2BD349]
> 
>
> threat protection l compliance l archiving & governance l secure
> communication
>

Re: Phoenix CSV Bulk Load fails to load a large file

2017-09-06 Thread Sergey Soldatov

Do you have more details on the version of Phoenix/HBase you are using as
well as how it hangs (Exceptions/messages that may help to understand the
problem)?

Thanks,
Sergey

On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookala  wrote:

> I'm trying to load a 3.5G file with 60 million rows using CsvBulkLoadTool.
> It hangs while loading HFiles. This runs successfully if I split this into
> 2 files, but I'd like to avoid doing that. This is on Amazon EMR, is this
> an issue due to disk space or memory. I have a single master and 2 region
> server configuration with 16 GB memory on each node.
>

Re: Phoenix csv bulkload tool not using data in rowtimestamp field for hbase timestamp

2017-08-28 Thread Sergey Soldatov

Hi Rahul,

It seems that you run into
https://issues.apache.org/jira/browse/PHOENIX-3406. You may apply the patch
and rebuild Phoenix. Sorry, actually I thought that it's already
integrated. Will revisit it again.

Thanks,
Sergey.

On Sun, Aug 20, 2017 at 3:57 AM, rahuledavalath1 
wrote:

> Hi All,
>
>
> I am using hbase1.1 and phoenix4.9.0. I linked my date colum in the phoenix
> table to the hbase timestamp using rowtimestamp feature.
> When i am inserting data with upsert query it's working fine. When i do
> phoenix csv mapreduce bulkload it's taking bulkloaded time as the hbase
> timestamp.
>
> Is this feature will not work with phoenix bulkload? or is any other way i
> can acheive this in the bulkload itself.
>
> Thanks& Regards
> Rahul
>
>
>
> --
> View this message in context: http://apache-phoenix-user-
> list.1124778.n5.nabble.com/Phoenix-csv-bulkload-tool-not-
> using-data-in-rowtimestamp-field-for-hbase-timestamp-tp3855.html
> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
>

Re: Potential causes for very slow DELETEs?

2017-08-18 Thread Sergey Soldatov

Hi Pedro,

Usually that kind of behavior should be reflected in the region server
logs. Try to turn DEBUG level and check what exactly RS is doing during
that time. Also you may check the thread dump of RS during the execution
and see what are rpc handlers are doing. One thing that should be checked
first - the RPC handlers. If they are all busy you may consider to increase
the number of handlers. If you have RPC scheduler and controller
configured, double check that regular handlers are used, but not IndexRPC
(there was a bug that client is sending all rpc with index priority). If
you see it, remove controller factory property on client side.

Thanks,
Sergey

On Fri, Aug 18, 2017 at 4:46 AM, Pedro Boado  wrote:

> Hi all,
>
> We have two HBase 1.0 clusters running the same process in parallel
> -effectively keeps the same data in both Phoenix tables-
>
> This process feeds data into Phoenix 4.5 via HFile and once the data is
> loaded a Spark process deletes a few thousand rows from the tables
> -secondary indexing is disabled in our installation- .
>
> After an HBase restart -no config changes involved-, one of the clusters
> have started running these deletes too slowly (the fast run is taking 5min
> and the slow one around 1h). And more worryingly while the process is
> running Phoenix queries are taking hundreds of seconds instead of being sub
> second (even opening sqlline is very slow).
>
> We've almost run out of ideas trying to find the cause of this behaviour.
> There are no evident GC pauses, CPU usage,  Hdfs IO is normal, Memory usage
> is normal, etc.
>
> As soon as the delete process finishes Phoenix goes back to normal
> behaviour.
>
> Does anybody have any ideas for potential causes of this behaviour?
>
> Many thanks!!
>
> Pedro.
>

Re: hash cache errors

2017-07-26 Thread Sergey Soldatov

Well, PHOENIX-4010 should not happen often. If your tables have more
regions than number of region servers you may use HBase load balancer by
tables. In that case all region servers will have some regions for each
table, so there will be no chance that region moved to the RS without hash.
To confirm the issue you need to check MS log at the time when the problem
happened and see whether any region was moved. Also check the execution
time for the query for (1) reason.

Thanks,
Sergey

On Wed, Jul 26, 2017 at 4:42 PM, Mike Prendergast <
mikeprenderg...@iotait.com> wrote:

> I think https://issues.apache.org/jira/browse/PHOENIX-4010 may be the
> issue for us, is there a way I can confirm that is the case? Can I force a
> region server to update its join cache in some way, as a workaround?
>
> Michael Prendergast
> *iota IT*
> Vice President / Software Engineer
> (cell) 703.594.1053 <(703)%20594-1053>
> (office)  571.386.4682 <(571)%20386-4682>
> (fax) 571.386.4681 <(571)%20386-4681>
>
> This e-mail and any attachments to it are intended only for the identified
> recipient(s). It may contain proprietary or otherwise legally protected
> information of Iota IT, Inc. Any unauthorized distribution, use or
> disclosure of this communication is strictly prohibited. If you have
> received this communication in error, please notify the sender and delete
> or otherwise destroy the e-mail and all attachments immediately.
>
> On Fri, Jul 21, 2017 at 8:38 PM, Sergey Soldatov <sergeysolda...@gmail.com
> > wrote:
>
>> Hi Mike,
>>
>> There are a couple reasons why it may happen:
>> 1. server side cache expired. Time to live can be changed by
>> phoenix.coprocessor.maxServerCacheTimeToLiveMs
>> 2. Region has been moved to another region server where the join cache is
>> missing. Look at https://issues.apache.org/jira/browse/PHOENIX-4010
>>
>> Thanks,
>> Sergey
>>
>> On Fri, Jul 21, 2017 at 5:00 PM, Mike Prendergast <
>> mikeprenderg...@iotait.com> wrote:
>>
>>> I am connecting to an EMR 5.6 cluster running Phoenix 4.9 using the
>>> Phoenix JDBC thick client, and getting these errors consistently. Can
>>> somebody point me in the right direction as to what the issue might be?
>>>
>>> org.apache.phoenix.exception.PhoenixIOException:
>>> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash
>>> cache for joinId: i�K���
>>>�.
>>> The cache might have expired and have been removed.
>>> at org.apache.phoenix.coprocessor.HashJoinRegionScanner.(
>>> HashJoinRegionScanner.java:102)
>>> at org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.
>>> getRegionScanner(NonAggregateRegionScannerFactory.java:148)
>>> at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScan
>>> nerOpen(ScanRegionObserver.java:72)
>>> at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$Reg
>>> ionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:221)
>>> at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$Reg
>>> ionScannerHolder.nextRaw(BaseScannerRegionObserver.java:266)
>>> at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRp
>>> cServices.java:2633)
>>> at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRp
>>> cServices.java:2837)
>>> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Clie
>>> ntService$2.callBlockingMethod(ClientProtos.java:34950)
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu
>>> tor.java:188)
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
>>> (state=08000,code=101)
>>> org.apache.phoenix.exception.PhoenixIOException:
>>> org.apache.phoenix.exception.PhoenixIOException:
>>> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash
>>> cache for joinId: i�K���
>>>  �. The cache might have expired and have
>>> been removed.
>>> at org.apache.phoenix.coprocessor.HashJoinRegionScanner.(
>>> HashJoinRegionScanner.java:102)
>>> at org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.
>>> getRegionScanner(NonAggregateRegionScannerFactory.java:148)
>>> at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScan
>>> nerOpen(ScanRegionObserver.java:72)
>>> at o

Re: hash cache errors

2017-07-21 Thread Sergey Soldatov

Hi Mike,

There are a couple reasons why it may happen:
1. server side cache expired. Time to live can be changed by
phoenix.coprocessor.maxServerCacheTimeToLiveMs
2. Region has been moved to another region server where the join cache is
missing. Look at https://issues.apache.org/jira/browse/PHOENIX-4010

Thanks,
Sergey

On Fri, Jul 21, 2017 at 5:00 PM, Mike Prendergast <
mikeprenderg...@iotait.com> wrote:

> I am connecting to an EMR 5.6 cluster running Phoenix 4.9 using the
> Phoenix JDBC thick client, and getting these errors consistently. Can
> somebody point me in the right direction as to what the issue might be?
>
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache
> for joinId: i�K���
>�. The
> cache might have expired and have been removed.
> at org.apache.phoenix.coprocessor.HashJoinRegionScanner.(
> HashJoinRegionScanner.java:102)
> at org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.
> getRegionScanner(NonAggregateRegionScannerFactory.java:148)
> at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScan
> nerOpen(ScanRegionObserver.java:72)
> at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$Reg
> ionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:221)
> at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$Reg
> ionScannerHolder.nextRaw(BaseScannerRegionObserver.java:266)
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(
> RSRpcServices.java:2633)
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(
> RSRpcServices.java:2837)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu
> tor.java:188)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> (state=08000,code=101)
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache
> for joinId: i�K���
>  �. The cache might have expired and have been
> removed.
> at org.apache.phoenix.coprocessor.HashJoinRegionScanner.(
> HashJoinRegionScanner.java:102)
> at org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.
> getRegionScanner(NonAggregateRegionScannerFactory.java:148)
> at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScan
> nerOpen(ScanRegionObserver.java:72)
> at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$Reg
> ionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:221)
> at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$Reg
> ionScannerHolder.nextRaw(BaseScannerRegionObserver.java:266)
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(
> RSRpcServices.java:2633)
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(
> RSRpcServices.java:2837)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu
> tor.java:188)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu
> tor.java:168)
>
> at org.apache.phoenix.util.ServerUtil.parseServerException(Serv
> erUtil.java:116)
> at org.apache.phoenix.iterate.BaseResultIterators.getIterators(
> BaseResultIterators.java:875)
> at org.apache.phoenix.iterate.BaseResultIterators.getIterators(
> BaseResultIterators.java:819)
> at org.apache.phoenix.iterate.RoundRobinResultIterator.getItera
> tors(RoundRobinResultIterator.java:176)
> at org.apache.phoenix.iterate.RoundRobinResultIterator.next(Rou
> ndRobinResultIterator.java:91)
> at org.apache.phoenix.iterate.DelegateResultIterator.next(Deleg
> ateResultIterator.java:44)
> at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultS
> et.java:778)
>
>
> Michael Prendergast
> *iota IT*
> Vice President / Software Engineer
> (cell) 703.594.1053 <(703)%20594-1053>
> (office)  571.386.4682 <(571)%20386-4682>
> (fax) 571.386.4681 <(571)%20386-4681>
>
> This e-mail and any attachments to it are intended only for the identified
> recipient(s). It may contain proprietary or otherwise legally protected
> information of Iota IT, Inc. Any unauthorized distribution, use or
> disclosure of this communication is strictly prohibited. If you have
> received this communication in error, please notify the sender and delete
> or otherwise destroy the e-mail and all attachments immediately.
>

Re: ArrayIndexOutOfBounds excpetion

2017-07-19 Thread Sergey Soldatov

Hi Siddharth
The problem that was described in PHOENIX-3196 (as well as in PHOENIX-930
and several others) is that we sent metadata updates for the table before
checking whether there is a problem with duplicated column names. I don't
think you hit the same problem if you are using alter tables. Could you
check whether it's possible to create simple steps to reproduce it?

Thanks,
Sergey

On Tue, Jul 11, 2017 at 3:02 AM, Siddharth Ubale <
siddharth.ub...@syncoms.com> wrote:

> Hi,
>
>
>
> We are using a Phoenix table , where we constantly upgrade the structure
> by altering the table and adding new columns.
>
> Almost, every 3 days we see that the table becomes unusable via phoenix
> after some alter commands have altered the table.
>
> Earlier we were under the impression that one of the columns gets created
> which is a duplicate causing a metadata issue with phoenix which it is
>
> Unable to manage at this stage based on online discussions.
>
> Also, there exists an unresolved issue below about the same.
>
> https://issues.apache.org/jira/browse/PHOENIX-3196
>
>
>
> Can anyone tell me if they are facing the same and what they have done in
> order to check this occurrence.
>
>
>
> Please find the stack trace for my problem below:
>
>
>
>
>
> Error: org.apache.hadoop.hbase.DoNotRetryIOException: DATAWAREHOUSE3: null
>
> at org.apache.phoenix.util.ServerUtil.createIOException(
> ServerUtil.java:89)
>
> at org.apache.phoenix.coprocessor.
> MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:546)
>
> at org.apache.phoenix.coprocessor.generated.
> MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267)
>
> at org.apache.hadoop.hbase.regionserver.HRegion.
> execService(HRegion.java:6001)
>
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> execServiceOnRegion(HRegionServer.java:3510)
>
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> execService(HRegionServer.java:3492)
>
> at org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30950)
>
> at org.apache.hadoop.hbase.ipc.
> RpcServer.call(RpcServer.java:2109)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:101)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:130)
>
> at org.apache.hadoop.hbase.ipc.
> RpcExecutor$1.run(RpcExecutor.java:107)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>
>
>
> SQLState:  08000
>
> ErrorCode: 101
>
>
>
>
>
> Thanks,
>
> Siddharth Ubale,
>
>
>

Re: Getting too many open files during table scan

2017-06-23 Thread Sergey Soldatov

You may check "Are there any tips for optimizing Phoenix?" section of
Apache Phoenix FAQ at https://phoenix.apache.org/faq.html. It says how to
pre-split table. In your case you may split on the first letters of
client_id.

When we are talking about monotonous data, we usually mean the primary key
only. For example if we have primary key integer ID and writing something
with auto increment ID, all the data will go to a single region, creating a
hot spot there. In this case (and actually only in this case) salting may
be useful, since it adds an additional random byte in front of primary key,
giving us a chance to distribute the write load across the cluster. In all
other cases salting causes more work on the cluster since we will be unable
to do a single point lookup/range scan by primary key and need to make
lookup for all salting keys + pk.

Thanks,
Sergey


On Fri, Jun 23, 2017 at 12:00 PM, Michael Young  wrote:

> >>Don't you have any other column which is obligatory in queries during
> reading but not monotonous with ingestion?
> We have several columns used in typical query WHERE clauses (like
> userID='abc' or a specific user attributes, data types). However, there are
> a number of columns which are monotonous with many rows having the same
> value.
>
> We have tried running after update STATISTICS on tables, but that would be
> worth investigating again.
>
> Can you give me a hint how to pre-split the data?
>
> Let's say we have the following PK columns (all varchar except dt=date):
> client_id,dt (date),rule_id,user_id,attribute_1,attribute_2,rule_
> name,browser_type,device_type,os_type,page,group_name,period
>
> and non-PK columns in the same table
> requests,connections,queues,queue_time
>
> What would be the suggested way to pre-split?  I'm not familiar with this
> technique beyond very simple use cases.
>
> Thanks!
>
> On Thu, Jun 22, 2017 at 11:31 PM, Ankit Singhal 
> wrote:
>
>> bq. A leading date column is in our schema model:-
>> Don't you have any other column which is obligatory in queries during
>> reading but not monotonous with ingestion? As pre-split can help you
>> avoiding hot-spotting.
>> For parallelism/performance comparison, have you tried running a query on
>> a non-salted table after updating the stats and comparing performance with
>> a salted table?
>>
>>
>> On Fri, Jun 23, 2017 at 9:49 AM, Michael Young 
>> wrote:
>>
>>> We started with no salt buckets, but the performance was terrible in our
>>> testing.
>>>
>>> A leading date column is in our schema model.  We don't seem to be
>>> getting hotspotting after salting.  Date range scans are very common as are
>>> slice and dice on many dimension columns.
>>>
>>> We have tested with a range of SALT values from 0 to 120 for bulk
>>> loading, upserts, selects at different concurrent load levels on a test
>>> cluster before moving to production (with some tweaking post-production).
>>> However, we had fewer average regions per RS during the testing.  The
>>> larger SALT numbers definitely gave overall better performance on our
>>> predominantly read-heavy environment.
>>>
>>> I appreciate any insights to identify bottlenecks.
>>>
>>> On Thu, Jun 22, 2017 at 6:26 PM, James Taylor 
>>> wrote:
>>>
 My recommendation: don't use salt buckets unless you have a
 monatomically increasing row key, for example one that leads with the
 current date/time. Otherwise you'll be putting more load (# of salt buckets
 more load worst case) for bread-and-butter small-range-scan Phoenix 
 queries.

 Thanks,
 James

 On Fri, Jun 23, 2017 at 10:06 AM Michael Young 
 wrote:

> The ulimit open files was only 1024 for the user executing the query.
> After increasing, the queries behaves better.
>
> How can we tell if we need to reduce/increase the number of salt
> buckets?
>
> Our team set this based on read/write performance using data volume
> and expected queries to be run by users.
>
> However, now it seems the performance has degraded.  We can recreate
> the schemas using fewer/more buckets and reload the data, but I haven't
> seen a hard and fast rule for setting the number of buckets.
>
> We have 12 data nodes, 4 SSDs per node, 128 GB Ram per node, 24 core
> w/ hyperthreading (HDP 2.5 running, hbase is primary service).
> and 800+ regions per RS (seems high)
>
> Any orientation on this would be greatly appreciated.
>
>
> On Tue, Jun 20, 2017 at 11:54 AM, Josh Elser 
> wrote:
>
>> I think this is more of an issue of your 78 salt buckets than the
>> width of your table. Each chunk, running in parallel, is spilling
>> incremental counts to disk.
>>
>> I'd check your ulimit settings on the node which you run this query
>> from and try to increase the

Re: Cant run map-reduce index builder because my view/idx is lower case

2017-06-22 Thread Sergey Soldatov

You may try to build Phoenix with patch from  PHOENIX-3710
 applied.That should
fix the problem, I believe.
Thanks,
Sergey

On Mon, Jun 19, 2017 at 11:28 AM, Batyrshin Alexander <0x62...@gmail.com>
wrote:

> Hello again,
>
> Could you, please, help me to run map-reduce for indexing view with
> lower-case name?
>
> Here is my test try on Phoenix-4.8.2:
>
> CREATE TABLE "table" (
> c1 varchar,
> c2 varchar,
> c3 varchar
> CONSTRAINT pk PRIMARY KEY (c1,c2,c3)
> )
>
> CREATE VIEW "table_view"
> AS SELECT * FROM "table" WHERE c3 = 'X';
>
> CREATE INDEX "table_view_idx" ON "table_view" (c2, c1) ASYNC;
>
> sudo -u hadoop ./bin/hbase org.apache.phoenix.mapreduce.index.IndexTool
> --data-table '"table_view"' --index-table '"table_view_idx"' --output-path
> ASYNC_IDX_HFILES
>
> 2017-06-19 21:27:17,716 ERROR [main] index.IndexTool: An exception
> occurred while performing the indexing job: IllegalArgumentException:
>  TABLE_VIEW_IDX is not an index table for TABLE_VIEW  at:
> java.lang.IllegalArgumentException:  TABLE_VIEW_IDX is not an index table
> for TABLE_VIEW
> at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.
> java:190)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.phoenix.mapreduce.index.IndexTool.main(
> IndexTool.java:394)
>
>
> On 17 Jun 2017, at 03:55, Batyrshin Alexander <0x62...@gmail.com> wrote:
>
>  Hello,
> Im trying to build ASYNC index by example from https://phoenix.apache.
> org/secondary_indexing.html
> My issues is that my view name and index name is lower case, so map-reduce
> rise error:
>
> 2017-06-17 03:45:56,506 ERROR [main] index.IndexTool: An exception
> occurred while performing the indexing job: IllegalArgumentException:
>  INVOICES_V4_INDEXED_FUZZY_IDX is not an index table for
> INVOICES_V4_INDEXED_FUZZY
>
>
>

Re: what kind of type in phoenix is suitable for mysql type text?

2017-06-14 Thread Sergey Soldatov

Text in mysql is stored out of the table. HBase for that has MOB (medium
objects), but Phoenix doesn't support that at the moment. So, the only
option is to use varchar, HBase by default allows you to have something
like 10mb in a single KV pair, but you may change it using
hbase.client.keyvalue.maxsize
property.

Thanks,
Sergey

On Wed, Jun 14, 2017 at 7:21 PM, 曾柏棠  wrote:

> Hi,
>   I am using phoenix4.7. I am try to migrate data from mysql to phoenix
> ,so what kind of type in phoenix is suitable for mysql type text?
>
>  thanks
>

Re: Large CSV bulk load stuck

2017-06-06 Thread Sergey Soldatov

Which version of Phoenix you are using? There were several bugs related to
local index and CSV bulkload in 4.7 and 4.8 I believe. Another problem I
remember is the RAM size for reducers. It may sound ridiculous, but using
less may help.

Thanks,
Sergey

On Fri, Jun 2, 2017 at 11:13 AM, cmbendre 
wrote:

> Hi,
>
> I need some help in understanding how CsvBulkLoadTool works. I am trying to
> load data ~ 200 GB (There are 100 files of 2 GB each) from hdfs to Phoenix
> with 1 master and 4 region-servers. These region servers have 32 GB RAM and
> 16 cores each. Total HDFS disk space is 4 TB.
>
> The table is salted with 16. So 4 regions per regionservers. There are 400
> columns and more than 30 local indexes.
>
> Here is the command i am using -
> /HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf
> hadoop jar /usr/lib/phoenix/phoenix-client.jar
> org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=
> 000
> --table TABLE_SNAPSHOT --input /user/table/*.csv/
>
> The job proceeds normally but gets stuck at reduce phase around 90 %. I
> also
> observed that initially it was using full resource of the cluster but it
> uses much less resources near completion. (10 percent of RAM and cores).
>
> What exactly is happening behind the scenes ? How i can tune it to work
> faster ? I am using HBase + HDFS deployed on YARN on AWS.
>
> Any help is appreciated.
>
> Thanks
> Chaitanya
>
>
>
>
> --
> View this message in context: http://apache-phoenix-user-
> list.1124778.n5.nabble.com/Large-CSV-bulk-load-stuck-tp3622.html
> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
>

Re: Delete from Array

2017-06-06 Thread Sergey Soldatov

>From the Apache Phoenix documentation:


   - Partial update of an array is currently not possible. Instead, the
   array may be manipulated on the client-side and then upserted back in its
   entirety.

Thanks,
Sergey

On Mon, Jun 5, 2017 at 7:25 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> Can I delete elements from Phoenix arrays?
>
> Regards,
>
> Cheyenne O. Forbes
>

Re: Class org.apache.phoenix.mapreduce.bulkload.TableRowkeyPair not found

2017-06-01 Thread Sergey Soldatov

You may try to remove mapredcp and keep /etc/hbase/conf in the
HADOOP_CLASSPATH.


Thanks,
Sergey

On Thu, Jun 1, 2017 at 12:59 AM, cmbendre 
wrote:

> Trying to bulk load CSV file on Phoenix 4.9.0 on EMR.
>
> Following is the command -
>
> /export HADOOP_CLASSPATH=$(hbase mapredcp):/usr/lib/hbase/conf
>
> hadoop jar /usr/lib/phoenix/phoenix-client.jar
> org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=
> 000
> --table PROFILESTORE --input /user/merged3.csv/
>
> But it throws the following error-
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.RuntimeException: /java.lang.ClassNotFoundException: Class
> org.apache.phoenix.mapreduce.bulkload.TableRowkeyPair not found
> at org.apache.hadoop.conf.Configuration.getClass(
> Configuration.java:2227)
> at org.apache.hadoop.mapred.JobConf.getMapOutputKeyClass(
> JobConf.java:813)
> at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getMapOutputKeyClass(
> JobContextImpl.java:142)
> at
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(
> TableMapReduceUtil.java:832)
> at
> org.apache.phoenix.mapreduce.MultiHfileOutputFormat.
> configureIncrementalLoad(MultiHfileOutputFormat.java:698)
> at
> org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(
> AbstractBulkLoadTool.java:301)
> at
> org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(
> AbstractBulkLoadTool.java:270)
> at
> org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(
> AbstractBulkLoadTool.java:183)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(
> CsvBulkLoadTool.java:101)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> Class org.apache.phoenix.mapreduce.bulkload.TableRowkeyPair not found
> at org.apache.hadoop.conf.Configuration.getClass(
> Configuration.java:2195)
> at org.apache.hadoop.conf.Configuration.getClass(
> Configuration.java:2219)
> ... 16 more
> Caused by: java.lang.ClassNotFoundException: Class
> org.apache.phoenix.mapreduce.bulkload.TableRowkeyPair not found
> at
> org.apache.hadoop.conf.Configuration.getClassByName(
> Configuration.java:2101)
> at org.apache.hadoop.conf.Configuration.getClass(
> Configuration.java:2193)
> ... 17 more/
>
> When i unset the HADOOP_CLASSPATH like suggested in this JIRA -
> https://issues.apache.org/jira/browse/PHOENIX-3835
>
> This shows another error -
>
> /Error: java.lang.RuntimeException: java.sql.SQLException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the
> locations
> at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.setup(
> FormatToBytesWritableMapper.java:142)
> at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.setup(
> CsvToKeyValueMapper.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.sql.SQLException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the
> locations
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(
> ConnectionQueryServicesImpl.java:2432)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(
> ConnectionQueryServicesImpl.java:2352)
> at
> org.apache.phoenix.util.PhoenixContextExecutor.call(
> PhoenixContextExecutor.java:76)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.init(
> ConnectionQueryServicesImpl.java:2352)
> at
> org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(
> PhoenixDriver.java:232)
> at
> org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(
> PhoenixEmbeddedDriver.java:147)
> at org.apache.phoenix.jdbc.PhoenixDriver.connect(
> PhoenixDriver.java:202)
> at

Re: Async Index Creation fails due to permission issue

2017-05-26 Thread Sergey Soldatov

try to create a directory which will be accessible for everyone (777) and
point output directory there (like --output-path
/temp/MYTABLE_GLOBAL_INDEX_HFILE).
Could you also provide a bit more information whether you are using
kerberos and versions of hdfs/hbase/phoenix.

Thanks,
Sergey

On Tue, May 23, 2017 at 10:51 AM, anil gupta  wrote:

> I think you need to run the tool as "hbase" user.
>
> On Tue, May 23, 2017 at 5:43 AM, cmbendre 
> wrote:
>
>> I created an ASYNC index and ran the IndexTool Map-Reduce job to populate
>> it.
>> Here is the command i used
>>
>> hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table MYTABLE
>> --index-table MYTABLE_GLOBAL_INDEX --output-path
>> MYTABLE_GLOBAL_INDEX_HFILE
>>
>> I can see that index HFiles are created successfully on HDFS but then the
>> job fails due to permission errors. The files are created as "hadoop"
>> user,
>> and they do not have any permission for "hbase" user. Here is the error i
>> get -
>>
>> /Caused by:
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.
>> security.AccessControlException):
>> Permission denied: user=hbase, access=EXECUTE,
>> inode="/user/hadoop/MYTABLE_GLOBAL_INDEX_HFILE/MYTABLE_GLOBA
>> L_INDEX/0/a4c9888f8e284158bfb79b30b2cdee82":hadoop:hadoop:drwxrwx---
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:320)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckTraverse(FSPermissionChecker.java:259)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckPermission(FSPermissionChecker.java:205)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckPermission(FSPermissionChecker.java:190)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>> ission(FSDirectory.java:1728)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>> ission(FSDirectory.java:1712)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPath
>> Access(FSDirectory.java:1686)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock
>> LocationsInt(FSNamesystem.java:1830)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock
>> Locations(FSNamesystem.java:1799)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock
>> Locations(FSNamesystem.java:1712)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.get
>> BlockLocations(NameNodeRpcServer.java:588)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
>> erSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolS
>> erverSideTranslatorPB.java:365)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
>> enodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
>> voker.call(ProtobufRpcEngine.java:616)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1698)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)/
>>
>> The hack i am using right now is to set the permissions manually for these
>> files when the IndexTool job is running. Is there a better way ?
>>
>>
>>
>> --
>> View this message in context: http://apache-phoenix-user-lis
>> t.1124778.n5.nabble.com/Async-Index-Creation-fails-due-to-
>> permission-issue-tp3573.html
>> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Row timestamp usage

2017-05-22 Thread Sergey Soldatov

AFAIK depending on the version of Phoenix you are using, you may experience
problems with MR bulk load or indexes. Possible some other 'side effects' -
try to search JIRAs for 'ROW TIMESTAMP".  There is no way to alter the
column type except drop/create this column.

Thanks,
Sergey


On Mon, May 22, 2017 at 3:45 PM, Michael Young  wrote:

> I am using a DATE type column as one of the leading columns in my PK and I
> am defining it as "ROW TIMESTAMP" to take advantage of the optimizations
> mentioned here:https://phoenix.apache.org/rowtimestamp.html
>
> Are there any disadvantages to using this feature?  My PK has 20+ columns
> (queries are done over date ranges so I am interested in any optimizations
> which help such queries).  The value is set on UPSERT to the daily value,
> the hour/minutes aren't really needed for my use cases so I just use
> midnight 00:00.000 (eg. 2017-01-01 00:00:00.000).
>
> Once it's set, is there a way to alter the column type to be a regular
> DATE type?  Or would I need to recreate the table?
>
> Just wondering out of curiosity in case there are instances where I should
> not be using this feature.
>
> Cheers,
> -Michael
>
>

Re: ORDER BY not working with UNION ALL

2017-05-10 Thread Sergey Soldatov

Well, even if you don't use family, you will get an error that column
date_time is undefined. Consider the result of UNION ALL as a separate
table and ORDER BY is applied to this table. You don't have column
date_time there. Don't forget that UNION ALL may work with different
tables, so there may be no date_time column in the second table.
So you need a construction something like :
select p.name from (select p.name, f.date_time  UNION ALL select
p.name.f.date_time  ORDER BY f.date_time LIMIT 20)

Thanks,
Sergey

On Wed, May 3, 2017 at 4:57 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

>  I get "Undefined column family. familyName=f" whenever I run the
> following query,it works without the ORDER BY and works with the ORDER BY
> if its not a union and just one select statement
>
>SELECT
>   p.name
> FROM
>   person p
> JOIN
>   friends f
> ON
>   f.person = p.id
> WHERE
>   567 != ANY(f.persons)
> UNION ALL
> SELECT
>   p.name
> FROM
>   person p
> JOIN
>   friends f
> ON
>   f.person = p.id
> WHERE
>   123 != ANY(f.persons)
> ORDER BY f.date_time LIMIT 20
>
> Regards,
>
> Cheyenne O. Forbes
>

Re: How can I "use" a hbase co-processor from a User Defined Function?

2017-04-19 Thread Sergey Soldatov

How do you handle HBase region splits and merges with such architecture?

Thanks,
Sergey

On Wed, Apr 19, 2017 at 9:22 AM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> I created a hbase co-processor that stores/deletes text indexes with
> Lucene, the indexes are stored on HDFS (for back up, replication, etc.).
> The indexes "mirror" the regions so if the index for a column is at
> "hdfs://localhost:9000/hbase/region_name" the index is stored at
> "hdfs://localhost:9000/lucene/region_name". I did this just in case I
> needed to delete (or other operation) an entire region for which ever
> reason. The id of the row, the column and query are passed to a Lucene
> BooleanQuery to get a search score to use to sort the data
> "SEARCH_SCORE(primary_key, text_column_name, search_query)". So I am trying
> to find a way to get "HRegion" of the region server the code is running on
> to either *1.* get the region name and the hadoop FileSystem or *2. *get
> access to the co-processor on that server which already have the values in
> option *1*
>
> Regards,
>
> Cheyenne O. Forbes
>
>
>
> On Wed, Apr 19, 2017 at 10:59 AM, James Taylor <jamestay...@apache.org>
> wrote:
>
>> Can you describe the functionality you're after at a high level in terms
>> of a use case (rather than an implementation idea/detail) and we can
>> discuss any options wrt potential new features?
>>
>> On Wed, Apr 19, 2017 at 8:53 AM Cheyenne Forbes <
>> cheyenne.osanu.for...@gmail.com> wrote:
>>
>>> I'd still need " *HRegion MyVar; ", *because I'd still need the name of
>>> the region where the row of the id passed to the UDF is located and the
>>> value returned my* "getFilesystem()" *of* "**HRegion", *what do you
>>> recommend that I do?
>>>
>>> Regards,
>>>
>>> Cheyenne O. Forbes
>>>
>>>
>>>
>>> On Tue, Apr 18, 2017 at 6:27 PM, Sergey Soldatov <
>>> sergeysolda...@gmail.com> wrote:
>>>
>>>> I mean you need to modify Phoenix code itself to properly support such
>>>> kind of features.
>>>>
>>>> Thanks,
>>>> Sergey
>>>>
>>>> On Tue, Apr 18, 2017 at 3:52 PM, Cheyenne Forbes <
>>>> cheyenne.osanu.for...@gmail.com> wrote:
>>>>
>>>>> Could you explain a little more what you mean by that?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Cheyenne O. Forbes
>>>>>
>>>>>
>>>>> On Tue, Apr 18, 2017 at 4:36 PM, Sergey Soldatov <
>>>>> sergeysolda...@gmail.com> wrote:
>>>>>
>>>>>> I may be wrong, but you have chosen wrong approach. Such kind of
>>>>>> integration need to be (should be) done on the Phoenix layer in the way
>>>>>> like global/local indexes are implemented.
>>>>>>
>>>>>> Thanks,
>>>>>> Sergey
>>>>>>
>>>>>> On Tue, Apr 18, 2017 at 12:34 PM, Cheyenne Forbes <
>>>>>> cheyenne.osanu.for...@gmail.com> wrote:
>>>>>>
>>>>>>> I am creating a plugin that uses Lucene to index text fields and I
>>>>>>> need to access *getConf()* and *getFilesystem()* of *HRegion, *the
>>>>>>> Lucene indexes are split with the regions so I need  " *HRegion
>>>>>>> MyVar; ", *I am positive the UDF will run on the region server and
>>>>>>> not the client*.*
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Cheyenne O. Forbes
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 18, 2017 at 1:22 PM, James Taylor <
>>>>>>> jamestay...@apache.org> wrote:
>>>>>>>
>>>>>>>> Shorter answer is "no". Your UDF may be executed on the client side
>>>>>>>> as well (depending on the query) and there is of course no HRegion
>>>>>>>> available from the client.
>>>>>>>>
>>>>>>>> On Tue, Apr 18, 2017 at 11:10 AM Sergey Soldatov <
>>>>>>>> sergeysolda...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Well, theoretically there is a way of having a coprocessor that
>>>>>>>>> will keep static public map of current rowkey processed by Phoenix 
>>>>>>>>> and the
>>>>>>>>> correlated HRegion instance and get this HRegion using the key that is
>>>>>>>>> processed by evaluate function. But it's a completely wrong approach 
>>>>>>>>> for
>>>>>>>>> both HBase and Phoenix. And it's not clear for me why SQL query may 
>>>>>>>>> need
>>>>>>>>> access to the region internals.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sergey
>>>>>>>>>
>>>>>>>>> On Mon, Apr 17, 2017 at 10:04 PM, Cheyenne Forbes <
>>>>>>>>> cheyenne.osanu.for...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> so there is no way of getting HRegion in a UDF?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: How can I "use" a hbase co-processor from a User Defined Function?

2017-04-18 Thread Sergey Soldatov

I mean you need to modify Phoenix code itself to properly support such kind
of features.

Thanks,
Sergey

On Tue, Apr 18, 2017 at 3:52 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> Could you explain a little more what you mean by that?
>
> Regards,
>
> Cheyenne O. Forbes
>
>
> On Tue, Apr 18, 2017 at 4:36 PM, Sergey Soldatov <sergeysolda...@gmail.com
> > wrote:
>
>> I may be wrong, but you have chosen wrong approach. Such kind of
>> integration need to be (should be) done on the Phoenix layer in the way
>> like global/local indexes are implemented.
>>
>> Thanks,
>> Sergey
>>
>> On Tue, Apr 18, 2017 at 12:34 PM, Cheyenne Forbes <
>> cheyenne.osanu.for...@gmail.com> wrote:
>>
>>> I am creating a plugin that uses Lucene to index text fields and I need
>>> to access *getConf()* and *getFilesystem()* of *HRegion, *the Lucene
>>> indexes are split with the regions so I need  " *HRegion MyVar; ", *I
>>> am positive the UDF will run on the region server and not the client*.*
>>>
>>> Regards,
>>>
>>> Cheyenne O. Forbes
>>>
>>>
>>> On Tue, Apr 18, 2017 at 1:22 PM, James Taylor <jamestay...@apache.org>
>>> wrote:
>>>
>>>> Shorter answer is "no". Your UDF may be executed on the client side as
>>>> well (depending on the query) and there is of course no HRegion available
>>>> from the client.
>>>>
>>>> On Tue, Apr 18, 2017 at 11:10 AM Sergey Soldatov <
>>>> sergeysolda...@gmail.com> wrote:
>>>>
>>>>> Well, theoretically there is a way of having a coprocessor that will
>>>>> keep static public map of current rowkey processed by Phoenix and the
>>>>> correlated HRegion instance and get this HRegion using the key that is
>>>>> processed by evaluate function. But it's a completely wrong approach for
>>>>> both HBase and Phoenix. And it's not clear for me why SQL query may need
>>>>> access to the region internals.
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>> On Mon, Apr 17, 2017 at 10:04 PM, Cheyenne Forbes <
>>>>> cheyenne.osanu.for...@gmail.com> wrote:
>>>>>
>>>>>> so there is no way of getting HRegion in a UDF?
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Are arrays stored and retrieved in the order they are added to phoenix?

2017-04-18 Thread Sergey Soldatov

Of course they are stored in the same order, but using special encoding.
It's explained in PArrayDataType:

/**
 * The datatype for PColummns that are Arrays. Any variable length
array would follow the below order. Every element
 * would be seperated by a seperator byte '0'. Null elements are
counted and once a first non null element appears we
 * write the count of the nulls prefixed with a seperator byte.
Trailing nulls are not taken into account. The last non
 * null element is followed by two seperator bytes. For eg a, b, null,
null, c, null -> 65 0 66 0 0 2 67 0 0 0 a null
 * null null b c null d -> 65 0 0 3 66 0 67 0 0 1 68 0 0 0. The reason
we use this serialization format is to allow the
 * byte array of arrays of the same type to be directly comparable
against each other. This prevents a costly
 * deserialization on compare and allows an array column to be used as
the last column in a primary key constraint.
 */

On Thu, Apr 13, 2017 at 8:19 AM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> I was wonder if the arrays are stored in the order I add them or they are
> sorted otherwise (maybe for performance reasons)
>

Re: How can I "use" a hbase co-processor from a User Defined Function?

2017-04-18 Thread Sergey Soldatov

I may be wrong, but you have chosen wrong approach. Such kind of
integration need to be (should be) done on the Phoenix layer in the way
like global/local indexes are implemented.

Thanks,
Sergey

On Tue, Apr 18, 2017 at 12:34 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> I am creating a plugin that uses Lucene to index text fields and I need to
> access *getConf()* and *getFilesystem()* of *HRegion, *the Lucene indexes
> are split with the regions so I need  " *HRegion MyVar; ", *I am positive
> the UDF will run on the region server and not the client*.*
>
> Regards,
>
> Cheyenne O. Forbes
>
>
> On Tue, Apr 18, 2017 at 1:22 PM, James Taylor <jamestay...@apache.org>
> wrote:
>
>> Shorter answer is "no". Your UDF may be executed on the client side as
>> well (depending on the query) and there is of course no HRegion available
>> from the client.
>>
>> On Tue, Apr 18, 2017 at 11:10 AM Sergey Soldatov <
>> sergeysolda...@gmail.com> wrote:
>>
>>> Well, theoretically there is a way of having a coprocessor that will
>>> keep static public map of current rowkey processed by Phoenix and the
>>> correlated HRegion instance and get this HRegion using the key that is
>>> processed by evaluate function. But it's a completely wrong approach for
>>> both HBase and Phoenix. And it's not clear for me why SQL query may need
>>> access to the region internals.
>>>
>>> Thanks,
>>> Sergey
>>>
>>> On Mon, Apr 17, 2017 at 10:04 PM, Cheyenne Forbes <
>>> cheyenne.osanu.for...@gmail.com> wrote:
>>>
>>>> so there is no way of getting HRegion in a UDF?
>>>>
>>>
>>>
>

Re: Problem connecting JDBC client to a secure cluster

2017-04-17 Thread Sergey Soldatov

That's not hbase-site.xml loaded incorrectly. This is the behavior of java
classpath. It's accept only jars and directories. So if any resources
should be added to the classpath other than jars, you need to add to the
classpath the directory where they are located.

Thanks,
Sergey

On Tue, Apr 11, 2017 at 10:15 AM, rafa  wrote:

> Hi all,
>
>
> I have been able to track down the origin of the problem and it is
> related to the hbase-site.xml not being loaded correctly by the application
> server.
>
> Seeing the instructions given by Anil in this JIRA:
> https://issues.apache.org/jira/browse/PHOENIX-19 it has been easy to
> reproduce it
>
> java   -cp
> /tmp/testhbase2:/opt/cloudera/parcels/CLABS_PHOENIX-4.7.0-1.
> clabs_phoenix1.3.0.p0.000/lib/phoenix/lib/hadoop-hdfs-2.6.0-
> cdh5.7.0.jar:/opt/cloudera/parcels/CLABS_PHOENIX-4.7.0-1.
> clabs_phoenix1.3.0.p0.000/lib/phoenix/phoenix-4.7.0-clabs-phoenix1.3.0-client.jar
> sqlline.SqlLine -d org.apache.phoenix.jdbc.PhoenixDriver  -u
> jdbc:phoenix:node-01u..int:2181:phoe...@hadoop.int:/
> etc/security/keytabs/phoenix.keytab  -n none -p none --color=true
> --fastConnect=false --verbose=true  --incremental=false
> --isolation=TRANSACTION_READ_COMMITTED
>
> In /tmp/testhbase2 there are 3 files:
>
> -rw-r--r--   1 root root  4027 Apr 11 18:23 hdfs-site.xml
> -rw-r--r--   1 root root  3973 Apr 11 18:29 core-site.xml
> -rw-rw-rw-   1 root root  3924 Apr 11 18:49 hbase-site.xml
>
>
> a) If hdfs-site.xml is missing or invalid:
>
> It fails with Caused by: java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
>
> (with HA HDFS, hdfs-site.xml  is needed to resolve the name service)
>
> b) if core-site.xml is missing or invalid:
>
>  17/04/11 19:05:01 WARN security.UserGroupInformation:
> PriviledgedActionException as:root (auth:SIMPLE) 
> cause:javax.security.sasl.SaslException:
> GSS initiate failed [Caused by GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)]
> 17/04/11 19:05:01 WARN ipc.RpcClientImpl: Exception encountered while
> connecting to the server : javax.security.sasl.SaslException: GSS
> initiate failed [Caused by GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)]
> 17/04/11 19:05:01 FATAL ipc.RpcClientImpl: SASL authentication failed. The
> most likely cause is missing or invalid credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
> at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(
> GssKrb5Client.java:211)
> at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.
> saslConnect(HBaseSaslRpcClient.java:181)
>
> ...
> Caused by: GSSException: No valid credentials provided (Mechanism level:
> Failed to find any Kerberos tgt)
> at sun.security.jgss.krb5.Krb5InitCredential.getInstance(
> Krb5InitCredential.java:147)
> at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(
> Krb5MechFactory.java:121)
> at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(
> Krb5MechFactory.java:187)
> at sun.security.jgss.GSSManagerImpl.getMechanismContext(
> GSSManagerImpl.java:223)
> at sun.security.jgss.GSSContextImpl.initSecContext(
> GSSContextImpl.java:212)
> at sun.security.jgss.GSSContextImpl.initSecContext(
> GSSContextImpl.java:179)
> at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(
> GssKrb5Client.java:192)
>
>
>
> c) If hbase-site.xml is missing or invalid:
>
> The zookeeeper connection works right, but not the Hbase master one:
>
> java  -cp /tmp/testhbase2:/opt/cloudera/parcels/CLABS_PHOENIX-4.7.0-1.
> clabs_phoenix1.3.0.p0.000/lib/phoenix/lib/hadoop-hdfs-2.6.0-
> cdh5.7.0.jar:/opt/cloudera/parcels/CLABS_PHOENIX-4.7.0-1.
> clabs_phoenix1.3.0.p0.000/lib/phoenix/phoenix-4.7.0-clabs-phoenix1.3.0-client.jar
> sqlline.SqlLine -d org.apache.phoenix.jdbc.PhoenixDriver  -u
> jdbc:phoenix:node-01u..int:2181:phoe...@hadoop.int:/
> etc/security/keytabs/phoenix.keytab  -n none -p none --color=true
> --fastConnect=false --verbose=true  --incremental=false
> --isolation=TRANSACTION_READ_COMMITTED
> Setting property: [incremental, false]
> Setting property: [isolation, TRANSACTION_READ_COMMITTED]
> issuing: !connect jdbc:phoenix:node-01u..int:2181:phoe...@hadoop.int:/
> etc/security/keytabs/phoenix.keytab none none org.apache.phoenix.jdbc.
> PhoenixDriver
> Connecting to jdbc:phoenix:node-01u..int:2181:phoe...@hadoop.int:/
> etc/security/keytabs/phoenix.keytab
> 17/04/11 19:06:38 INFO query.ConnectionQueryServicesImpl: Trying to
> connect to a secure cluster with keytab:/etc/security/keytabs/
> phoenix.keytab
> 17/04/11 19:06:38 INFO security.UserGroupInformation: Login successful for
> user

Re: How can I "use" a hbase co-processor from a User Defined Function?

2017-04-17 Thread Sergey Soldatov

No. UDF function doesn't have any context where it's executed, so it can't
obtain neither region instance nor coprocessor instance

Thanks,
Sergey

On Fri, Apr 14, 2017 at 10:39 AM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> would *my_udf* be executed on the region server that the row of the
> column that is passed to it is located on ?
>

Re: phoenix client config and memory

2017-03-08 Thread Sergey Soldatov

Hi,
You may specify HBase config by setting HBASE_CONF_DIR env variable. Or
just put the directory that has it in the classpath. If no hbase-site.xml
found, the default values will be used.
As for the memory, it really depends on the usage scenario. If you have
large tables with thousands of regions it may come that you want to
increase the number of working threads and of course in this case the
driver will require more memory.

Thanks,
Sergey

On Tue, Mar 7, 2017 at 9:33 AM, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi,
>
>
>1. When using phoenix thick client(4.4.0) at the application, where
>does the client hbase-site.xml reside..I don’t see one? Or does it pull the
>hbase-site.xml from the server before starting up?I do see the phoenix
>query time out set in the server side habse-site though it is client
>setting.
>2. When using phoenix thick client,  does the client need extra memory
>to be allocated in general to run the queries and when it does operations
>like client merge sort. With mutiple such queries runnign at the same time?
>
>
> Thanks,
> Pradheep
>

Re: How to migrate sql cascade and foreign keys

2017-03-08 Thread Sergey Soldatov

Well, Apache Phoenix doesn't support foreign key, so you need to manage
this functionality on your application layer. Sometimes, depending on the
scenario you may emulate this functionality using VIEWs for user table with
additional columns instead of creating a set of separated tables. More
information about views you may find at
https://phoenix.apache.org/views.html


Thanks,
Sergey

On Thu, Mar 2, 2017 at 5:32 AM, mferlay  wrote:

> Hi everybody,
> I need to migrate an sql script which creates all our database in apache
> phoenix format. I do not manage to translate foreign keys and delete
> cascade
> from this following sample:
>
> CREATE TABLE IF NOT EXISTS A(
>   ID VARCHAR(255) NOT NULL,
>   colX VARCHAR(255) NULL,
>   PRIMARY KEY(ID)
> );
>
> CREATE TABLE IF NOT EXISTS B(
>   ID VARCHAR(255) NOT NULL,
>   colY VARCHAR(255) NOT NULL,
>   PRIMARY KEY(colY)
> );
>
> CREATE TABLE IF NOT EXISTS C(
>   colY VARCHAR(255) NOT NULL,
>   ID VARCHAR(255) NOT NULL,
>   PRIMARY KEY(ID,colY),
>   FOREIGN KEY(ID)
> REFERENCES A(ID)
>   ON DELETE CASCADE
>   ON UPDATE NO ACTION,
>   FOREIGN KEY(colY)
> REFERENCES B(colY)
>   ON DELETE CASCADE
>   ON UPDATE NO ACTION
> )
>
>
> thanks for your help,
>
> Regards
>
>
>
> --
> View this message in context: http://apache-phoenix-user-
> list.1124778.n5.nabble.com/How-to-migrate-sql-cascade-
> and-foreign-keys-tp3226.html
> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
>

Re: Read only user permissions to Phoenix table - Phoenix 4.5

2017-02-16 Thread Sergey Soldatov

Unfortunately some versions of Phoenix client is using HBase API (such
as getHTableDescriptor)
that requires HBase CREATE/ADMIN permissions on system tables. Moreover the
upgrade path is trying to create system tables to check whether system
requires an upgrade and that may fail with permission exception (that's
your case).  Mostly those problems should be gone in 4.9 where upgrade is
the manual operation. So, no easy way to avoid this problem without
patching the sources (patch itself is obvious though).

Thanks,
Sergey

On Thu, Feb 16, 2017 at 2:03 AM, Pedro Boado  wrote:

> Hi all,
>
> I have a quick question. We are still running on Phoenix 4.5 (I know, it's
> not my fault) and we're trying to setup a read only user on a phoenix
> table. The minimum set of permissions to get access through sqlline is
>
> grant 'readonlyuser' , 'RXC', 'SYSTEM.CATALOG'
> grant 'readonlyuser' , 'RXC', 'SYSTEM.SEQUENCE'
> grant 'readonlyuser' , 'RXC', 'SYSTEM.STATS'
> grant 'readonlyuser' , 'RXC', 'SYSTEM.FUNCTION'
> grant 'readonlyuser' , 'RX', 'READONLY.TABLENAME'
>
> I was wondering whether there is a way for avoiding the need of the CREATE
> permission on catalog tables.
>
> Cheers,
> Pedro.
>
>

Re: Statistics collection in Phoenix

2016-11-02 Thread Sergey Soldatov

Hi Mich,
The statistic is stored in SYSTEM.STATS table. And yes, there are
guideposts per column family. As for (3) and (4) I think the answer is no.
Guideposts are more like a point for specific row key (so if we scan for
specific row key we can find quickly whether to start scanning) and let us
run more scans in parallel. And they are using on client side.

Thanks,
Sergey

On Sun, Oct 30, 2016 at 3:55 PM, Mich Talebzadeh 
wrote:

> According to document 
>
> The UPDATE STATISTICS command updates the statistics collected on a table,
> to improve query performance. This command collects a set of keys per
> region per column family that are equal byte distanced from each other.
> These collected keys are called *guideposts* and they act as
> *hints/guides* to improve the parallelization of queries on a given
> target region.
>
> Few questions I Have
>
>
>1. Where are the statistics for a given table is kept
>2. Does this mean that each column family of  a table has its own
>statistics
>3. Is statistics collected similar to statistics for store-index in
>Hive ORC table
>4. Can statistics been used in predicate push down
>
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Phoenix Slow Problem

2016-11-02 Thread Sergey Soldatov

Hi Fawaz,
Actually explain plan says that there will be 6 parallel full scans. I
believe that's the number of regions you have. If you want to increase the
number of parallel scans you may think about setting
phoenix.stats.guidepost.width to something smaller than default value and
scans will be executed for smaller chunks (and it will be faster) or split
table to increase number of regions.

Thanks,
Sergey

On Mon, Oct 31, 2016 at 4:19 PM, Fawaz Enaya 
wrote:

> Thanks for your answer but why it gives 1 way parallel and can not be more?
>
>
> On Sunday, 30 October 2016, Mich Talebzadeh 
> wrote:
>
>> If you create a secondary index in Phoenix on the table on single or
>> selected columns, that index (which will be added to Hbase) will be used to
>> return data. For example in below MARKETDATAHBASE_IDX1 is an index on table
>> MARKETDATAHBASE and is used by the query
>>
>>
>>  0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from
>> MARKETDATAHBASE;
>> ++
>> |PLAN|
>> ++
>> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1  |
>> |* SERVER FILTER BY FIRST KEY ONLY*|
>> | SERVER AGGREGATE INTO SINGLE ROW   |
>> ++
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 30 October 2016 at 11:42, Fawaz Enaya  wrote:
>>
>>> Hi All in this great project,
>>>
>>>
>>> I have an HBase cluster of four nodes, I use Phoenix to access HBase,
>>> but I do not know why its too much slow to execute SELECT count(*) for
>>> table contains 5 million records it takes 8 seconds.
>>> Below is the explain for may select statement
>>>
>>> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN
>>> OVER TABLE* |*
>>>
>>> *| *SERVER FILTER BY FIRST KEY ONLY
>>>  * |*
>>>
>>> *| *SERVER AGGREGATE INTO SINGLE ROW
>>> Anyone can help.
>>>
>>> Many Thanks
>>> --
>>> Thanks & regards,
>>>
>>>
>>
>
> --
> --
> Thanks & regards,
>
>
>

Re: Phoenix + Spark

2016-10-26 Thread Sergey Soldatov

deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> thanks
>
>
> 2016-10-26 15:09 GMT+08:00 Sergey Soldatov <sergeysolda...@gmail.com>:
>
>> (1) You need only client jar (phoenix--client.jar)
>> (2) set spark.executor.extraClassPath in the spark-defaults.conf to the
>> client jar
>> Hope that would help.
>>
>> Thanks,
>> Sergey
>>
>> On Tue, Oct 25, 2016 at 9:31 PM, min zou <zoumin1...@gmail.com> wrote:
>>
>>> Dear, i use spark to do data analysis,then save the result to Phonix.
>>> When i run the application on Intellij IDEA by local model, the apllication
>>> runs ok, but i run it by spark-submit(spark-submit --class
>>> com.bigdata.main.RealTimeMain --master yarn  --driver-memory 2G
>>> --executor-memory 2G --num-executors 5 /home/zt/rt-analyze-1.0-SNAPSHOT.jar)
>>> on my cluster, i get a error:Caused by: java.lang.ClassNotFoundException:
>>> Class org.apache.phoenix.mapreduce.PhoenixOutputFormat not found.
>>>
>>> Exception in thread "main" java.lang.RuntimeException:
>>> java.lang.ClassNotFoundException: Class 
>>> org.apache.phoenix.mapreduce.PhoenixOutputFormat
>>> not foundat org.apache.hadoop.conf.Configu
>>> ration.getClass(Configuration.java:2112)at
>>> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFor
>>> matClass(JobContextImpl.java:232)at org.apache.spark.rdd.PairRDDFu
>>> nctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:971)at
>>> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:903)
>>>   at 
>>> org.apache.phoenix.spark.ProductRDDFunctions.saveToPhoenix(ProductRDDFunctions.scala:51)
>>>   at com.mypackage.save(DAOImpl.scala:41)at
>>> com.mypackage.ProtoStreamingJob.execute(ProtoStreamingJob.scala:58)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>   at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>   at java.lang.reflect.Method.invoke(Method.java:606)at
>>> com.mypackage.SparkApplication.sparkRun(SparkApplication.scala:95)
>>> at 
>>> com.mypackage.SparkApplication$delayedInit$body.apply(SparkApplication.scala:112)
>>>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)at
>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>>   at scala.App$$anonfun$main$1.apply(App.scala:71)at
>>> scala.App$$anonfun$main$1.apply(App.scala:71)at
>>> scala.collection.immutable.List.foreach(List.scala:318)at
>>> scala.collection.generic.TraversableForwarder$class.foreach(
>>> TraversableForwarder.scala:32)at scala.App$class.main(App.scala:71)
>>>   at com.mypackage.SparkApplication.main(SparkApplication.scala:15)
>>> at com.mypackage.ProtoStreamingJobRunner.main(ProtoStreamingJob.scala)
>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>   at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>   at java.lang.reflect.Method.invoke(Method.java:606)at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:569)at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>>>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>>>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
>>>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused
>>> by: java.lang.ClassNotFoundException: Class
>>> org.apache.phoenix.mapreduce.PhoenixOutputFormat not foundat
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
>>>   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
>>>   ... 30 more
>>>
>>>
>>> Then i use spark-submit --jars(spark-submit --class
>>> com.bigdata.main.RealTimeMain --master yarn --jars
>>> /root/apache-phoenix-4.8.0-HBase-1.2-bin/phoenix-spark-4.8.0
>>> -HBase-1.2.jar,/root/apache-phoenix-4.8.0-HBase-1.2-bin/phoe
>>> nix-4.8.0-HBase-1.2-client.jar,/root/apache-phoenix-4.8.0-
>>> HBase-1.2-bin/phoenix-core-4.8.0-HBase-1.2.jar--driver-memory 2G
>>> --executor-memory 2G --num-executors 5 /home/zm/rt-analyze-1.0-SNAPSHOT.jar)
>>> , i get the same error. My cluster is CDH5.7,phoenix4.8.0, Hbase1.2,
>>> spark1.6 . How can i solve the promble ? Please help me. thanks.
>>>
>>
>>
>

Re: Phoenix + Spark

2016-10-26 Thread Sergey Soldatov

(1) You need only client jar (phoenix--client.jar)
(2) set spark.executor.extraClassPath in the spark-defaults.conf to the
client jar
Hope that would help.

Thanks,
Sergey

On Tue, Oct 25, 2016 at 9:31 PM, min zou  wrote:

> Dear, i use spark to do data analysis,then save the result to Phonix. When
> i run the application on Intellij IDEA by local model, the apllication runs
> ok, but i run it by spark-submit(spark-submit --class
> com.bigdata.main.RealTimeMain --master yarn  --driver-memory 2G
> --executor-memory 2G --num-executors 5 /home/zt/rt-analyze-1.0-SNAPSHOT.jar)
> on my cluster, i get a error:Caused by: java.lang.ClassNotFoundException:
> Class org.apache.phoenix.mapreduce.PhoenixOutputFormat not found.
>
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class 
> org.apache.phoenix.mapreduce.PhoenixOutputFormat
> not foundat org.apache.hadoop.conf.Configu
> ration.getClass(Configuration.java:2112)at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFor
> matClass(JobContextImpl.java:232)at org.apache.spark.rdd.PairRDDFu
> nctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:971)at
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:903)
>   at 
> org.apache.phoenix.spark.ProductRDDFunctions.saveToPhoenix(ProductRDDFunctions.scala:51)
>   at com.mypackage.save(DAOImpl.scala:41)at
> com.mypackage.ProtoStreamingJob.execute(ProtoStreamingJob.scala:58)at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)at
> com.mypackage.SparkApplication.sparkRun(SparkApplication.scala:95)at
> com.mypackage.SparkApplication$delayedInit$body.apply(SparkApplication.scala:112)
>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)at
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)at
> scala.App$$anonfun$main$1.apply(App.scala:71)at
> scala.collection.immutable.List.foreach(List.scala:318)at
> scala.collection.generic.TraversableForwarder$class.foreach(
> TraversableForwarder.scala:32)at scala.App$class.main(App.scala:71)
>   at com.mypackage.SparkApplication.main(SparkApplication.scala:15)at
> com.mypackage.ProtoStreamingJobRunner.main(ProtoStreamingJob.scala)at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
> $SparkSubmit$$runMain(SparkSubmit.scala:569)at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by:
> java.lang.ClassNotFoundException: Class 
> org.apache.phoenix.mapreduce.PhoenixOutputFormat
> not foundat org.apache.hadoop.conf.Configu
> ration.getClassByName(Configuration.java:2018)at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
> ... 30 more
>
>
> Then i use spark-submit --jars(spark-submit --class
> com.bigdata.main.RealTimeMain --master yarn --jars
> /root/apache-phoenix-4.8.0-HBase-1.2-bin/phoenix-spark-4.8.
> 0-HBase-1.2.jar,/root/apache-phoenix-4.8.0-HBase-1.2-bin/
> phoenix-4.8.0-HBase-1.2-client.jar,/root/apache-phoenix-4.8.
> 0-HBase-1.2-bin/phoenix-core-4.8.0-HBase-1.2.jar--driver-memory 2G
> --executor-memory 2G --num-executors 5 /home/zm/rt-analyze-1.0-SNAPSHOT.jar)
> , i get the same error. My cluster is CDH5.7,phoenix4.8.0, Hbase1.2,
> spark1.6 . How can i solve the promble ? Please help me. thanks.
>

Re: Creating Covering index on Phoenix

2016-10-23 Thread Sergey Soldatov

Hi Mich,
No, if you update HBase directly, the index will not be maintained.
Actually I would suggest to ingest data using Phoenix CSV bulk load.

Thanks,
Sergey.

On Sat, Oct 22, 2016 at 12:49 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> Thanks Sergey,
>
> In this case the phoenix view is defined on Hbase table.
>
> Hbase table is updated every 15 minutes via cron that uses
> org.apache.hadoop.hbase.mapreduce.ImportTsv  to bulk load data into Hbase
> table,
>
> So if I create index on my view in Phoenix, will that index be maintained?
>
> regards
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 23:35, Sergey Soldatov <sergeysolda...@gmail.com>
> wrote:
>
>> Hi Mich,
>>
>> It's really depends on the query that you are going to use. If conditions
>> will be applied only by time column you may create index like
>> create index I on "marketDataHbase" ("timecreated") include ("ticker",
>> "price");
>> If the conditions will be applied on others columns as well, you may use
>> create index I on "marketDataHbase" ("timecreated","ticker", "price");
>>
>> Index is updated together with the user table if you are using phoenix
>> jdbc driver or phoenix bulk load tools to ingest the data.
>>
>> Thanks,
>> Sergey
>>
>> On Fri, Oct 21, 2016 at 4:43 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>> I have  a Phoenix table on Hbase as follows:
>>>
>>> [image: Inline images 1]
>>>
>>> I want to create a covered index to cover the three columns: ticker,
>>> timecreated, price
>>>
>>> More importantly I want the index to be maintained when new rows are
>>> added to Hbase table.
>>>
>>> What is the best way of achieving this?
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>
>

Re: Creating Covering index on Phoenix

2016-10-21 Thread Sergey Soldatov

Hi Mich,

It's really depends on the query that you are going to use. If conditions
will be applied only by time column you may create index like
create index I on "marketDataHbase" ("timecreated") include ("ticker",
"price");
If the conditions will be applied on others columns as well, you may use
create index I on "marketDataHbase" ("timecreated","ticker", "price");

Index is updated together with the user table if you are using phoenix jdbc
driver or phoenix bulk load tools to ingest the data.

Thanks,
Sergey

On Fri, Oct 21, 2016 at 4:43 AM, Mich Talebzadeh 
wrote:

>
>
> Hi,
>
> I have  a Phoenix table on Hbase as follows:
>
> [image: Inline images 1]
>
> I want to create a covered index to cover the three columns: ticker,
> timecreated, price
>
> More importantly I want the index to be maintained when new rows are added
> to Hbase table.
>
> What is the best way of achieving this?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Scanning big region parallely

2016-10-21 Thread Sergey Soldatov

Hi Sanooj,

You may take a look at BaseResulterators.getIterators() and
BaseResultIterators.getParallelScans()

Thanks,
Sergey

On Fri, Oct 21, 2016 at 6:02 AM, Sanooj Padmakumar 
wrote:

> Hi all
>
> If anyone can provide some information as to which part of the phoenix
> code we need to check to see how parallel execution is performed.
>
> Thanks again
> Sanooj
>
> On 20 Oct 2016 11:31 a.m., "Sanooj Padmakumar"  wrote:
>
>> Hi James,
>>
>> We are loading data from Phoenix tables into in-memory database. Based on
>> the query we are finding the number of phoenix input splits (similar to
>> what happens inside phoenix MR) and loads the data into in-memory database
>> in parallel. So we are looking for ways to further parallelize the scan of
>> a larger region.
>>
>> As you mentioned phoenix does this for all its queries. Can you please
>> provide pointers to the phoenix code where this happens ?
>>
>> Thanks for the prompt response.
>>
>> Thanks
>> Sanooj Padmakumar
>>
>> On Wed, Oct 19, 2016 at 11:22 PM, James Taylor 
>> wrote:
>>
>>> Hi Sanooj,
>>> I'm not sure what you mean by "loading data in our HBase table into
>>> in-memory", but Phoenix queries tables in parallel, even within a region
>>> depending on how you've configured statistics and guideposts as described
>>> here: http://phoenix.apache.org/update_statistics.html
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Wednesday, October 19, 2016, Sanooj Padmakumar 
>>> wrote:
>>>
 Hi All


 We are are loading data in our HBase table into in-memory. For this we
 provide a start row and end row and scan the hbase regions. Is there a way
 we can scan a big region in parallel to fasten this whole process ? Any
 help/pointers on this will be of great help.

 --
 Thanks,
 Sanooj Padmakumar

>>>
>>
>>
>> --
>> Thanks,
>> Sanooj Padmakumar
>>
>

Re: NoClassDefFoundError org/apache/hadoop/hbase/HBaseConfiguration

2016-07-05 Thread Sergey Soldatov

Robert,
you should use the phoenix-4*-spark.jar that is located in  root phoenix
directory.

Thanks,
Sergey

On Tue, Jul 5, 2016 at 8:06 AM, Josh Elser  wrote:

> Looking into this on the HDP side. Please feel free to reach out via HDP
> channels instead of Apache channels.
>
> Thanks for letting us know as well.
>
> Josh Mahonin wrote:
>
>> Hi Robert,
>>
>> I recommend following up with HDP on this issue.
>>
>> The underlying problem is that the 'phoenix-spark-4.4.0.2.4.0.0-169.jar'
>> they've provided isn't actually a fat client JAR, it's missing many of
>> the required dependencies. They might be able to provide the correct JAR
>> for you, but you'd have to check with them. It may also be possible for
>> you to manually include all of the necessary JARs on the Spark classpath
>> to mimic the fat jar, but that's fairly ugly and time consuming.
>>
>> FWIW, the HDP 2.5 Tech Preview seems to include the correct JAR, though
>> I haven't personally tested it out yet.
>>
>> Good luck,
>>
>> Josh
>>
>> On Tue, Jul 5, 2016 at 2:00 AM, Robert James > > wrote:
>>
>> I'm trying to use Phoenix on Spark, and can't get around this error:
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hbase/HBaseConfiguration
>>  at
>>
>> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
>>
>> DETAILS:
>> 1. I'm running HDP 2.4.0.0-169
>> 2. Using phoenix-sqlline, I can access Phoenix perfectly
>> 3. Using hbase shell, I can access HBase perfectly
>> 4. I added the following lines to /etc/spark/conf/spark-defaults.conf
>>
>> spark.driver.extraClassPath
>> /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169
>> .jar
>> spark.executor.extraClassPath
>> /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169
>> .jar
>>
>> 5. Steps to reproduce the error:
>> # spark-shell
>> ...
>> scala> import org.apache.phoenix.spark._
>> import org.apache.phoenix.spark._
>>
>> scala> sqlContext.load("org.apache.phoenix.spark", Map("table" ->
>> "EMAIL_ENRON", "zkUrl" -> "localhost:2181"))
>> warning: there were 1 deprecation warning(s); re-run with -deprecation
>> for details
>> java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hbase/HBaseConfiguration
>>  at
>>
>> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
>>
>> // Or, this gets the same error
>> scala> val rdd = sc.phoenixTableAsRDD("EMAIL_ENRON", Seq("MAIL_FROM",
>> "MAIL_TO"), zkUrl=Some("localhost"))
>> java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hbase/HBaseConfiguration
>>  at
>>
>> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
>>  at
>>
>> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:38)
>>
>> 6. I've tried every permutation I can think of, and also spent hours
>> Googling.  Some times I can get different errors, but always errors.
>> Interestingly, if I manage to load the HBaseConfiguration class
>> manually (by specifying classpaths and then import), I get a
>> "phoenixTableAsRDD is not a member of SparkContext" error.
>>
>> How can I use Phoenix from within Spark?  I'm really eager to do so,
>> but haven't been able to.
>>
>> Also: Can someone give me some background on the underlying issues
>> here? Trial-and-error-plus-google is not exactly high quality
>> engineering; I'd like to understand the problem better.
>>
>>
>>

Re: Storage benefits of ARRAY types

2016-05-25 Thread Sergey Soldatov

HI Sumanta,
It's obvious. If it's fixed length, serialized values are stored one by
one. If data type has variable length, than a special separator is inserted
between values.

Thanks,
Sergey

On Wed, May 25, 2016 at 4:07 AM, Sumanta Gh  wrote:

> Hi,
> I found that when a VARCHAR ARRAY is stored in Hbase, lots of extra-bytes
> are also stored as opposed to FLOAT ARRAY. In terms of better disk
> utilization, I think converting non-searchable columns into an ARRAY is
> much more efficient.
>
> Could you please let me know how ARRAY types are stored in HBase.
>
> - Sumanta
>
>
> -Sumanta Gh  wrote: -
> To: user@phoenix.apache.org
> From: Sumanta Gh 
> Date: 05/13/2016 04:57PM
> Subject: Storage benefits of ARRAY types
>
>
> Hi,
> Do we gain any disk utilization benefit while defining columns as an array?
>
> - Sumanta
>
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>

Re: duplicated collumn name

2016-05-25 Thread Sergey Soldatov

Hi,
Since the table was not properly created, the only reasonable solution is
to delete it:
delete from SYSTEM.CATALOG where TABLE_NAME='TABLE_20160511';

And in hbase shell
disable 'TABLE_20160511'
drop 'TABLE_20160511'

Thanks,
Sergey

On Tue, May 24, 2016 at 2:04 AM, Tim Polach  wrote:

> Hi everyone,
>
>
>
> we found a problem when creating a table with duplicated collumn name,
> create was done but now the table cannot be dropped or searched by.
>
>
>
> Does anyone know how we could solve the problem or delete the problematic
> tables?
>
>
>
> Thanks!
>
>
>
> Using: phoenix-4.6.0 HBase-1.1
>
>
>
> Logs:
>
> Create:
>
> 2016-05-13 10:23:04,814 INFO
> (PhoenixUtils.java[createTableAndIndexForDate]:54) => Creating table in
> Phoenix...
>
> 2016-05-13 10:23:04,815 DEBUG
> (PhoenixUtils.java[generatePhoenixCreateTableStatement]:260) => CREATE
> TABLE IF NOT EXISTS TABLE_20160511 (c1 VARCHAR NOT NULL, c2 VARCHAR, c2
> VARCHAR, CONSTRAINT PK PRIMARY KEY (c1))
>
>
>
> Drop or select:
>
> 0: jdbc:phoenix:hadoop-node1> select * from TABLE_20160511;
>
> 16/05/23 22:56:17 WARN ipc.CoprocessorRpcChannel: Call failed on
> IOException
>
> org.apache.hadoop.hbase.DoNotRetryIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: TABLE_20160511: null
>
> at
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:84)
>
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:457)
>
> at
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390)
>
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873)
>
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>
>
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>
> at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:322)
>
> at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1619)
>
> at
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:92)
>
> at
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:89)
>
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>
> at
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
>
> at
> org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(CoprocessorRpcChannel.java:56)
>
> at
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService$Stub.getTable(MetaDataProtos.java:11769)
>
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1303)
>
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1290)
>
> at org.apache.hadoop.hbase.client.HTable$16.call(HTable.java:1727)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by:
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> org.apache.hadoop.hbase.DoNotRetryIOException: TABLE_20160511: null
>
> at
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:84)
>
> at
>

Re: [EXTERNAL] Re: TEXT Data type in Phoenix?

2016-03-30 Thread Sergey Soldatov

Jon,
I believe that it's just only metadata stuff. The VARCHAR
implementation itself doesn't rely on the size.

Thanks,
Sergey

On Wed, Mar 30, 2016 at 4:51 PM, Cox, Jonathan A <ja...@sandia.gov> wrote:
> Sergey,
>
> Thanks for the tip. Is there any real performance reason (memory or speed) to 
> use a pre-defined length for VARCHAR? Or is it really all the same under the 
> hood?
>
> -Jonathan
>
> -Original Message-
> From: sergey.solda...@gmail.com [mailto:sergey.solda...@gmail.com] On Behalf 
> Of Sergey Soldatov
> Sent: Wednesday, March 30, 2016 5:38 PM
> To: user@phoenix.apache.org
> Subject: [EXTERNAL] Re: TEXT Data type in Phoenix?
>
> Jon,
> It seems that documentation is a bit outdated. VARCHAR supports exactly what 
> you want:
> create table x (id bigint primary key, x varchar); upsert into x values (1, 
> ". (a lot of text there) " );
> 0: jdbc:phoenix:localhost> select length(x) from x;
> ++
> | LENGTH(X)  |
> ++
> | 1219   |
> ++
> 1 row selected (0.009 seconds)
>
> Thanks,
> Sergey
>
> On Wed, Mar 30, 2016 at 1:57 PM, Cox, Jonathan A <ja...@sandia.gov> wrote:
>> Is it possible to have the equivalent of the SQL data type “TEXT” with
>> Phoenix? The reason being, my data has columns with unspecified text length.
>> If I go with a varchar,  loading the entire CSV file into the database
>> may fail if one entry is too long.
>>
>>
>>
>> Maybe, however, there is really no reason to use TEXT with Phoenix?
>> Perhaps just using VARCHAR with a very long size is equivalent in
>> terms of performance and memory usage (given that Phoenix is HBase under the 
>> hood)?
>>
>>
>>
>> Thanks,
>>
>> Jon

Re: TEXT Data type in Phoenix?

2016-03-30 Thread Sergey Soldatov

Jon,
It seems that documentation is a bit outdated. VARCHAR supports
exactly what you want:
create table x (id bigint primary key, x varchar);
upsert into x values (1, ". (a lot of text there) " );
0: jdbc:phoenix:localhost> select length(x) from x;
++
| LENGTH(X)  |
++
| 1219   |
++
1 row selected (0.009 seconds)

Thanks,
Sergey

On Wed, Mar 30, 2016 at 1:57 PM, Cox, Jonathan A  wrote:
> Is it possible to have the equivalent of the SQL data type “TEXT” with
> Phoenix? The reason being, my data has columns with unspecified text length.
> If I go with a varchar,  loading the entire CSV file into the database may
> fail if one entry is too long.
>
>
>
> Maybe, however, there is really no reason to use TEXT with Phoenix? Perhaps
> just using VARCHAR with a very long size is equivalent in terms of
> performance and memory usage (given that Phoenix is HBase under the hood)?
>
>
>
> Thanks,
>
> Jon

Re: How to use VARBINARY with CsvBulkLoadTool

2016-03-29 Thread Sergey Soldatov

Hi Jon,
Base64 is supposed to be.

Thanks,
Sergey

On Tue, Mar 29, 2016 at 12:38 PM, Cox, Jonathan A  wrote:
> I am wondering how I can use the CsvBulkLoadTool to insert binary data to a
> table. For one thing, which format does CsvBulkLoadTool expect the data to
> be encoded as within the CSV, when inserted into a VARBINARY type? Hex?
> Base64? Something else? Is there a way to choose or specify?
>
>
>
> Thanks,
>
> Jon

Re: How phoenix converts Integer to byte array under the hood

2016-03-23 Thread Sergey Soldatov

Mohammad,
Honestly speaking, I'm not sure. Possible other guys have a definitive
answer. All I can say is that this API didn't change for last 1.5
years.
Thanks,
Sergey

On Wed, Mar 23, 2016 at 1:07 AM, Mohammad Adnan Raza
<adnanfai...@gmail.com> wrote:
> Thank you Sergey for quick response. I was exactly looking for this. This
> saved my hours digging into phoenix code base.
> Now I am decoding like -
> PInteger.INSTANCE.getCodec().decodeInt(bytes, 0, SortOrder.getDefault())
> and encoding it like
> byte[] baselineBytes = new byte[PInteger.INSTANCE.getByteSize()];
> PInteger.INSTANCE.getCodec().encodeInt(baseline, baselineBytes, 0);
>
> I just have a doubt here.. Does phoenix expose this class as an API or it is
> internal class? if they do expose it as API I can use it otherwise in future
> (with all JAVA9 JIGSAW feature or may be changes in API itself), it may
> start breaking.
>
>
>
> On Tue, Mar 22, 2016 at 11:11 PM, Sergey Soldatov <sergeysolda...@gmail.com>
> wrote:
>>
>> Hi Mohammad,
>> The right class to look into is PInteger. It has static class IntCodec
>> which is using for code/decode integers.
>>
>> Thanks,
>> Sergey
>>
>> On Tue, Mar 22, 2016 at 7:15 AM, Mohammad Adnan Raza
>> <adnanfai...@gmail.com> wrote:
>> > I am changing my question a bit to be more precise...
>> > Given a phoenix table with INTEGER column type. And if I fire upsert
>> > statement with integer value. How phoenix converts it to byte array and
>> > put
>> > in the Hbase table.
>> > Or if anyone can tell me which class is responsible for that conversion
>> > so I
>> > can look that code.
>> >
>> >
>> >
>> > On Tue, Mar 22, 2016 at 2:00 PM, Mohammad Adnan Raza
>> > <adnanfai...@gmail.com>
>> > wrote:
>> >>
>> >> Hello Everyone,
>> >>
>> >> I have created phoenix table like this
>> >>
>> >> CREATE TABLE PRODUCT_DETAILS(NAME VARCHAR NOT NULL PRIMARY
>> >> KEY,CF.VOLUME
>> >> INTEGER,CF.PRICE INTEGER,CF.DISCOUNT INTEGER,CF.BASELINE
>> >> INTEGER,CF.UPLIFT
>> >> INTEGER,CF.FINALPRICE INTEGER,CF.SALEPRICE INTEGER);
>> >>
>> >> Now the datatype Integer is 4 byte signed integer. I wonder how phoenix
>> >> converts this to hbase specific byte array.
>> >> https://phoenix.apache.org/language/datatypes.html link does talk about
>> >> conversion of other data types but not for INTEGER. For example
>> >> UNSIGNED_INT
>> >> is converted as Bytes.toInt(). I don't get a proper method for integer.
>> >> anyone knows about it?
>> >>
>> >> --
>> >>
>> >> With Best Regards,
>> >>
>> >>Mohd Adnan
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > With Best Regards,
>> >
>> >Mohd Adnan
>> >
>> >Feature Development Lead
>> >
>> >Mobile   +91-7498194516
>> >Blog   adnanfaizan.blogspot.in
>> >
>
>
>
>
> --
>
> With Best Regards,
>
>Mohd Adnan
>
>Feature Development Lead
>
>Mobile   +91-7498194516
>Blog   adnanfaizan.blogspot.in
>

Re: Extend CSVBulkLoadTool

2016-03-22 Thread Sergey Soldatov

Anil,
It's not necessary to use comma. You may use any other character as
the delimiter.
And you are right. Number of splits must match the number of columns.

Thanks,
Sergey

On Tue, Mar 22, 2016 at 6:31 AM, Anil <anilk...@gmail.com> wrote:
> Thanks Sergey for the response. i cannot change the decimeter in my file as
> comma used as valid char for my data.
>
> From my understanding of the code, number of splits in csv record must match
> with number of columns. Agree?
>
> Regards,
> Anil
>
>
>
> On 21 March 2016 at 23:52, Sergey Soldatov <sergeysolda...@gmail.com> wrote:
>>
>> Hi Anil,
>> It will be really painful since CSV bulk load is using Apache common
>> CSV format tool for parsing input lines and it expects that the
>> delimiter is a single character. I would suggest to prepare files
>> before bulk load replacing the delimiter string with a single
>> character using perl/sed scripts. It will be much easier.
>>
>> Thanks,
>> Sergey
>>
>> On Sun, Mar 20, 2016 at 11:05 PM, Anil <anilk...@gmail.com> wrote:
>> > Hi ,
>> >
>> > I see CSVBulkLoadTool accepts delimiter as single character only. So i
>> > have
>> > to customize it. do we have documentation of steps that bulk load tool
>> > perform ?  Please share.
>> >
>> > Thanks,
>> > Anil
>> >
>> >
>> >
>> >
>> >
>
>

Re: How phoenix converts Integer to byte array under the hood

2016-03-22 Thread Sergey Soldatov

Hi Mohammad,
The right class to look into is PInteger. It has static class IntCodec
which is using for code/decode integers.

Thanks,
Sergey

On Tue, Mar 22, 2016 at 7:15 AM, Mohammad Adnan Raza
 wrote:
> I am changing my question a bit to be more precise...
> Given a phoenix table with INTEGER column type. And if I fire upsert
> statement with integer value. How phoenix converts it to byte array and put
> in the Hbase table.
> Or if anyone can tell me which class is responsible for that conversion so I
> can look that code.
>
>
>
> On Tue, Mar 22, 2016 at 2:00 PM, Mohammad Adnan Raza 
> wrote:
>>
>> Hello Everyone,
>>
>> I have created phoenix table like this
>>
>> CREATE TABLE PRODUCT_DETAILS(NAME VARCHAR NOT NULL PRIMARY KEY,CF.VOLUME
>> INTEGER,CF.PRICE INTEGER,CF.DISCOUNT INTEGER,CF.BASELINE INTEGER,CF.UPLIFT
>> INTEGER,CF.FINALPRICE INTEGER,CF.SALEPRICE INTEGER);
>>
>> Now the datatype Integer is 4 byte signed integer. I wonder how phoenix
>> converts this to hbase specific byte array.
>> https://phoenix.apache.org/language/datatypes.html link does talk about
>> conversion of other data types but not for INTEGER. For example UNSIGNED_INT
>> is converted as Bytes.toInt(). I don't get a proper method for integer.
>> anyone knows about it?
>>
>> --
>>
>> With Best Regards,
>>
>>Mohd Adnan
>>
>
>
>
> --
>
> With Best Regards,
>
>Mohd Adnan
>
>Feature Development Lead
>
>Mobile   +91-7498194516
>Blog   adnanfaizan.blogspot.in
>

Re: Kerberos ticket renewal

2016-03-19 Thread Sergey Soldatov

Where do you see this error? Is it the client side? Ideally you don't
need to renew ticket since Phoenix Driver gets the required
information (principal name and keytab path) from jdbc connection
string and performs User.login itself.

Thanks,
Sergey

On Wed, Mar 16, 2016 at 11:02 AM, Sanooj Padmakumar  wrote:
> This is the error in the log when it fails
>
> ERROR org.apache.hadoop.security.UserGroupInformation -
> PriviledgedActionException as: (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]
>
> On Wed, Mar 16, 2016 at 8:35 PM, Sanooj Padmakumar 
> wrote:
>>
>> Hi Anil
>>
>> Thanks for your reply.
>>
>> We do not do anything explicitly in the code to do the ticket renwal ,
>> what we do is run a cron job for the user for which the ticket has to be
>> renewed.  But with this approach we need a restart to get the thing going
>> after the ticket expiry
>>
>> We use the following connection url for getting the phoenix connection
>> jdbc:phoenix:::/hbase::> keytab>
>>
>> This along with the entries in hbase-site.xml & core-site.xml are passed
>> to the connection object
>>
>> Thanks
>> Sanooj Padmakumar
>>
>> On Tue, Mar 15, 2016 at 12:04 AM, anil gupta 
>> wrote:
>>>
>>> Hi,
>>>
>>> At my previous job, we had web-services fetching data from a secure hbase
>>> cluster. We never needed to renew the lease by restarting webserver. Our app
>>> used to renew the ticket. I think, Phoenix/HBase already handles renewing
>>> ticket. Maybe you need to look into your kerberos environment settings.  How
>>> are you authenticating with Phoenix/HBase?
>>> Sorry, I dont remember the exact kerberos setting that we had.
>>>
>>> HTH,
>>> Anil Gupta
>>>
>>> On Mon, Mar 14, 2016 at 11:00 AM, Sanooj Padmakumar 
>>> wrote:

 Hi

 We have a rest style micro service application fetching data from hbase
 using Phoenix. The cluster is kerberos secured and we run a cron to renew
 the kerberos ticket on the machine where the micro service is deployed.

 But it always needs a restart of micro service java process to get the
 kerberos ticket working once after its expired.

 Is there a way I can avoid this restart?

 Any pointers will be very helpful. Thanks

 PS : We have a Solr based micro service which works without a restart.

 Regards
 Sanooj
>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Anil Gupta
>>
>>
>>
>>
>> --
>> Thanks,
>> Sanooj Padmakumar
>
>
>
>
> --
> Thanks,
> Sanooj Padmakumar

Re: Phoenix table is inaccessible...

2016-03-11 Thread Sergey Soldatov

The system information about all Phoenix tables is located in HBase
SYSTEM.CATALOG table. So, if you recreate the catalog you will need to
recreate all tables as well. I'm not sure is there any other way to
fix it.

On Fri, Mar 11, 2016 at 4:25 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEX)
<sagarwal...@bloomberg.net> wrote:
> Thanks. I will try that. Questions? I am able to access other tables fine.
> If SYSTEM.CATALOG got corrupted, wouldn't it impact all tables?
>
> Also how to restore SYSTEM.CATALOG table without restarting sqlline?
>
>
> Sent from Bloomberg Professional for iPhone
>
>
> - Original Message -
> From: Sergey Soldatov <sergeysolda...@gmail.com>
> To: SAURABH AGARWAL, user@phoenix.apache.org
> CC: ANIRUDHA JADHAV
> At: 11-Mar-2016 19:07:31
>
> Hi Saurabh,
> It seems that your SYSTEM.CATALOG got corrupted somehow. Usually you
> need to disable and drop 'SYSTEM.CATALOG' in hbase shell. After that
> restart sqlline (it will automatically recreate system catalog) and
> recreate all user tables. The table data usually is not affected, but
> just in case make a backup of your hbase before.
>
> Possible someone has a better advice.
>
> Thanks,
> Sergey
>
> On Fri, Mar 11, 2016 at 3:05 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEX)
> <sagarwal...@bloomberg.net> wrote:
>> Hi,
>>
>> I had been experimenting with different indexes on Phoenix table to get
>> the
>> desired performance.
>>
>> After creating secondary index that create index on one column and include
>> rest of the fields, it start throwing the following exceptions whenever I
>> access the table.
>>
>> Can you point me what might be went wrong here?
>>
>> We are using HDP 2.3 - HBase 1.1.2.2.3.2.0-2950,
>> phoenix-4.4.0.2.3.2.0-2950
>>
>> 0: jdbc:phoenix:> select count(*) from "Weather";
>> 16/03/11 17:37:32 WARN ipc.CoprocessorRpcChannel: Call failed on
>> IOException
>> org.apache.hadoop.hbase.DoNotRetryIOException:
>> org.apache.hadoop.hbase.DoNotRetryIOException: com.bloomb
>> erg.ds.WeatherSmallSalt: 35
>> at
>> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:84)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:447)
>> at
>>
>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataPr
>> otos.java:10505)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:187
>> 5)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
>> at
>>
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(Cl
>> ientProtos.java:32209)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.ArrayIndexOutOfBoundsException: 35
>> at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:354)
>> at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:276)
>> at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:265)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:826)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:462)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1696
>> )
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1643
>> )
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.addIndexToTable(MetaDataEndpointImpl.java
>> :526)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:803)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:462)
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1696
>> )
>> at
>>
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGe

Re: HBase Phoenix Integration

2016-02-29 Thread Sergey Soldatov

Hi Amit,

Switching to 4.3 means you need HBase 0.98. What kind of problem you
experienced after building 4.6 from sources with changes suggested on
StackOverflow?

Thanks,
Sergey

On Sun, Feb 28, 2016 at 10:49 PM, Amit Shah  wrote:
> An update -
>
> I was able to execute "./sqlline.py " command but I
> get the same exception as I mentioned earlier.
>
> Later I tried following the steps mentioned on this link with phoenix 4.3.0
> but I still get an error this time with a different stack trace (attached to
> this mail)
>
> Any help would be appreciated
>
> On Sat, Feb 27, 2016 at 8:03 AM, Amit Shah  wrote:
>>
>> Hi Murugesan,
>>
>> What preconditions would I need on the server to execute the python
>> script? I have Python 2.7.5 installed on the zookeeper server. If I just
>> copy the sqlline script to the /etc/hbase/conf directory and execute it I
>> get the below import errors. Note this time I had 4.5.2-HBase-1.0 version
>> server and core phoenix jars in HBase/lib directory on the master and region
>> servers.
>>
>> Traceback (most recent call last):
>>   File "./sqlline.py", line 25, in 
>> import phoenix_utils
>> ImportError: No module named phoenix_utils
>>
>> Pardon me for my knowledge about python.
>>
>> Thanks,
>> Amit
>>
>> On Fri, Feb 26, 2016 at 11:26 PM, Murugesan, Rani 
>> wrote:
>>>
>>> Did you test and confirm your phoenix shell from the zookeeper server?
>>>
>>> cd /etc/hbase/conf
>>>
>>> > phoenix-sqlline.py :2181
>>>
>>>
>>>
>>>
>>>
>>> From: Amit Shah [mailto:amits...@gmail.com]
>>> Sent: Friday, February 26, 2016 4:45 AM
>>> To: user@phoenix.apache.org
>>> Subject: HBase Phoenix Integration
>>>
>>>
>>>
>>> Hello,
>>>
>>>
>>>
>>> I have been trying to install phoenix on my cloudera hbase cluster.
>>> Cloudera version is CDH5.5.2 while HBase version is 1.0.
>>>
>>>
>>>
>>> I copied the server & core jar (version 4.6-HBase-1.0) on the master and
>>> region servers and restarted the hbase cluster. I copied the corresponding
>>> client jar on my SQuirrel client but I get an exception on connect. Pasted
>>> below. The connection url is “jdbc:phoenix::2181".
>>>
>>> I even tried compiling the source by adding cloudera dependencies as
>>> suggested on this post but didn't succeed.
>>>
>>>
>>>
>>> Any suggestions to make this work?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Amit.
>>>
>>>
>>>
>>> 
>>>
>>>
>>>
>>> Caused by:
>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>>> org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.CATALOG:
>>> org.apache.hadoop.hbase.client.Scan.setRaw(Z)Lorg/apache/hadoop/hbase/client/Scan;
>>>
>>> at
>>> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:87)
>>>
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:1319)
>>>
>>> at
>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11715)
>>>
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7388)
>>>
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1776)
>>>
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1758)
>>>
>>> at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>>>
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
>>>
>>> at
>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>>>
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>>
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: java.lang.NoSuchMethodError:
>>> org.apache.hadoop.hbase.client.Scan.setRaw(Z)Lorg/apache/hadoop/hbase/client/Scan;
>>>
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildDeletedTable(MetaDataEndpointImpl.java:1016)
>>>
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.loadTable(MetaDataEndpointImpl.java:1092)
>>>
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:1266)
>>>
>>> ... 10 more
>>>
>>>
>>>
>>> P.S - The full stacktrace is attached in the mail.
>>
>>
>

Re: Looks Like a SELECT Bug, But LIMIT Makes It Work

2016-02-23 Thread Sergey Soldatov

Hi Steve,
It looks like a bug. So, please file a JIRA.

Thanks,
Sergey

On Tue, Feb 23, 2016 at 12:52 PM, Steve Terrell  wrote:
> I came across a 4.6.0 query that I could not make work unless I add a
> "limit" to the end, where it should be totally unnecessary.
>
> select * from BUGGY where F1=1 and F3 is null
> results in no records found
>
> select * from BUGGY where F1=1 and F3 is null limit 999
> results (correctly) in one record found
>
> I think it's a bug where Phoenix gets confused about my HBase columns and
> views.  Here's how to set up the table to duplicate my problem:
>
> CREATE TABLE BUGGY (
>   F1 INTEGER NOT NULL,
>   A.F2 INTEGER,
>   B.F3 INTEGER,
>   CONSTRAINT my_pk PRIMARY KEY (F1));
> create view if not exists BUGGY_A as select * from BUGGY;
> alter view BUGGY_A drop column if exists F3;
> create view if not exists BUGGY_B as select * from BUGGY;
> alter view BUGGY_B drop column if exists F2;
> upsert into BUGGY(F1) values (1)
>
> Do you think I should file a JIRA for this?  Or is it a misunderstanding on
> my part?
>
> Thank you,
> Steve

Re: Phoenix rowkey

2016-02-22 Thread Sergey Soldatov

You do it exactly the way you described. Separate varchars by zero and
use fixed 8 bytes for the long.  So, for example if you have (varchar,
u_long, varchar) primary key, the rowkey for values like 'X',1,'Y'
will be :
('X', 0x00) (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01), ('Y')


Thanks,
Sergey

On Mon, Feb 22, 2016 at 8:43 AM, Sanooj Padmakumar  wrote:
> Hi,
>
> I have a HBase table created using Phoenix and the primary key is a
> combination of 7 columns out of which one is of type UNSIGNED_LONG and the
> rest are varchar.
>
> I want to perform select and upsert operation on this table using direct
> HBase api's. How do I construct the rowkey in this case ? I read that the
> separator used is (byte) 0 for varchar columns. But in my table there is an
> UNSIGNED_LONG and hence the question.
>
> Any suggestions will be of great help. Thanks
>
> PS : I cannot use direct phoenix here because Phoenix on Spark within a
> kerberised cluster is not working
>
> --
> Thanks,
> Sanooj Padmakumar

Re: Multiple upserts via JDBC

2016-02-19 Thread Sergey Soldatov

Zack,
Actually command line and GUI tools are using the same JDBC layer. It
would be nice if you provide more information about the application
itself.
Meanwhile you may try to set autocommit to false for the connection
and use .commit when all upserts were done.

Thanks,
Sergey

On Fri, Feb 19, 2016 at 4:02 AM, Riesland, Zack
<zack.riesl...@sensus.com> wrote:
> Thanks Sergey,
>
> The upserts are much faster via command line or a GUI tool like Aquadata 
> Studio.
>
> Table structure is below.
>
> I'm creating a new user with 8 permissions. So 9 total upserts.
>
> Individually, via command line, this is almost instantaneous. But via JDBC, 
> it takes tens of seconds to minutes.
>
> create table user (
>   user_id varchar (40) not null,
>   password_hash varchar (200) ,
>   user_full_name varchar (40),
>   user_email_address varchar (60),
>   token varchar ( 36),
>   expiration date
>   CONSTRAINT pk_user PRIMARY KEY (user_id)
> );
>
> create table user_access(
>user_id varchar(30) not null ,
>screen_id tinyint not null, --key to sda.screen
>access_id tinyint --key to sda.screen_access
>CONSTRAINT pk_user_access PRIMARY KEY (user_id, screen_id)
> );
>
> -Original Message-----
> From: sergey.solda...@gmail.com [mailto:sergey.solda...@gmail.com] On Behalf 
> Of Sergey Soldatov
> Sent: Friday, February 19, 2016 3:01 AM
> To: user@phoenix.apache.org
> Subject: Re: Multiple upserts via JDBC
>
> Hi Zack,
>
> Have you tried to use sqlline to manually do those upserts to check the 
> performance? Information about the tables structures would be useful as well.
>
> Thanks,
> Sergey
>
> On Tue, Feb 16, 2016 at 8:10 AM, Riesland, Zack <zack.riesl...@sensus.com> 
> wrote:
>> I have a handful of VERY small phoenix tables (< 100 entries).
>>
>>
>>
>> I wrote some javascript to interact with the tables via servlet + JDBC.
>>
>>
>>
>> I can query the data almost instantaneously, but upserting is
>> extremely slow – on the order of tens of seconds to several minutes.
>>
>>
>>
>> The main write operation does 10 upserts. Is there a better way to do
>> this than 10 separate statement.execute() commands?
>>
>>
>>
>> Is there a way to pass all 10 at once?
>>
>>
>>
>> Any tips on why these upserts might be so slow? I see that the tables
>> are backed by one region, so the overhead should be minimal.
>>
>>
>>
>> Thanks!
>>
>>

Re: Multiple upserts via JDBC

2016-02-19 Thread Sergey Soldatov

Hi Zack,

Have you tried to use sqlline to manually do those upserts to check
the performance? Information about the tables structures would be
useful as well.

Thanks,
Sergey

On Tue, Feb 16, 2016 at 8:10 AM, Riesland, Zack
 wrote:
> I have a handful of VERY small phoenix tables (< 100 entries).
>
>
>
> I wrote some javascript to interact with the tables via servlet + JDBC.
>
>
>
> I can query the data almost instantaneously, but upserting is extremely slow
> – on the order of tens of seconds to several minutes.
>
>
>
> The main write operation does 10 upserts. Is there a better way to do this
> than 10 separate statement.execute() commands?
>
>
>
> Is there a way to pass all 10 at once?
>
>
>
> Any tips on why these upserts might be so slow? I see that the tables are
> backed by one region, so the overhead should be minimal.
>
>
>
> Thanks!
>
>

82 matches

Mail list logo