Re: Python phoenixdb adapter and JSON serialization on PQS

2018-11-05 Thread Manoj Ganesan
Thanks for the pointers Josh. I'm working on getting a representative
concise test to demonstrate the issue.

Meanwhile, I had one question regarding the following:

You are right that the operations in PQS should be exactly the same,
> regardless of the client you're using -- that is how this architecture
> works.


IIUC, this means the following 2 methods should yield the same result:

   1. sqlline-thin.py -s JSON 
   2. using a python avatica client script making JSON requests

I made the following change in hbase-site.xml on the PQS host:


phoenix.queryserver.serialization
JSON


I notice that executing "sqlline-thin.py -s JSON " returns
results just fine. However, when I use a simple script to try the same
query, it returns 0 rows. I'm attaching the Python script here. The script
essentially makes HTTP calls using the Avatica JSON reference
. I assumed
that the sqlline-thin wrapper (when passed the -s JSON flag) also make HTTP
calls based on the JSON reference, is that not correct?

I'll work on getting some test cases here soon to illustrate this as well
as the performance problem.

Thanks again!
Manoj

On Mon, Nov 5, 2018 at 10:43 AM Josh Elser  wrote:

> Is the OOME issue regardless of using the Java client (sqlline-thin) and
> the Python client? I would like to know more about this one. If you can
> share something that reproduces the problem for you, I'd like to look
> into it. The only suggestion I have at this point in time is to make
> sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via
> PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right
> directory when you launch PQS.
>
> Regarding performance, as you've described it, it sounds like the Python
> driver is just slower than the Java driver. You are right that the
> operations in PQS should be exactly the same, regardless of the client
> you're using -- that is how this architecture works. Avatica is a wire
> protocol that all clients use to talk to PQS. More digging/information
> you can provide about the exact circumstances (and, again,
> steps/environment to reproduce what you see) would be extremely helpful.
>
> Thanks Manoj.
>
> - Josh
>
> On 11/2/18 7:16 PM, Manoj Ganesan wrote:
> > Thanks Josh for the response!
> >
> > I would definitely like to use protobuf serialization, but I'm observing
> > performance issues trying to run queries with a large number of results.
> > One problem is that I observe PQS runs out of memory, when its trying to
> > (what looks like to me) serialize the results in Avatica. The other is
> > that the phoenixdb python adapter itself spends a large amount of time
> > in the logic
> > <
> https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>
>
> > where its converting the protobuf rows to python objects.
> >
> > Interestingly when we use sqlline-thin.py instead of python phoenixdb,
> > the protobuf serialization works fine and responses are fast. It's not
> > clear to me why PQS would have problems when using the python adapter
> > and not when using sqlline-thin, do they follow different code paths
> > (especially around serialization)?
> >
> > Thanks again,
> > Manoj
> >
> > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser  > > wrote:
> >
> > I would strongly suggest you do not use the JSON serialization.
> >
> > The JSON support is implemented via Jackson which has no means to
> make
> > backwards compatibility "easy". On the contrast, protobuf makes this
> > extremely easy and we have multiple examples over the past years
> where
> > we've been able to fix bugs in a backwards compatible manner.
> >
> > If you want the thin client to continue to work across versions,
> stick
> > with protobuf.
> >
> > On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> >  > Hey everyone,
> >  >
> >  > I'm trying to use the Python phoenixdb adapter work with JSON
> >  > serialization on PQS.
> >  >
> >  > I'm using Phoenix 4.14 and the adapter works fine with protobuf,
> but
> >  > when I try making it work with an older version of phoenixdb
> > (before the
> >  > JSON to protobuf switch was introduced), it just returns 0 rows.
> > I don't
> >  > see anything in particular wrong with the HTTP requests itself,
> > and they
> >  > seem to conform to the Avatica JSON spec
> >  > (http://calcite.apache.org/avatica/docs/json_reference.html).
> >  >
> >  > Here's the result (with some debug statements) that returns 0
> rows.
> >  > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> >  >
> >  > request body =  {"maxRowCount": -2, "connectionId":
> >  > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
> > "prepareAndExecute",
> >  > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
> >  > request headers =  {'content-type': 

Re: High-availability for transactions

2018-11-05 Thread Curtis Howard
Hi Omid,

Thanks for the quick reply.

When the Omid/Phoenix integration is complete, will it be released only
with Phoenix 5.x, or will it also be integrated and tested with 4.14.x
branches?
Can you comment at all on what level of testing the integration will have
had, once released?  (will it be considered an alpha, or beta release,
initially?)

Curtis


Re: High-availability for transactions

2018-11-05 Thread Ohad Shacham
Hi Curtis,

Omid does provide high availability for transactions, you can find the full
technical details in here:
https://www.usenix.org/conference/fast17/technical-sessions/presentation/shacham
In a nutshell, in-flight transactions in this case are aborted by the
transaction manager, they are identified since every transaction manager
starts a new epoch. We also have an invalidation mechanism to guarantee
snapshot isolation and this guarantees that high availability does not
incur any overhead in the mainstream execution.

We are currently at the final stages of the integration with Phoenix and
will start a dedicated release in Omid in a few days.

Thx,
Ohad


On Mon, Nov 5, 2018 at 7:15 PM Curtis Howard 
wrote:

> Hi,
>
> Is there a best approach to ensuring high-availability for transactions?
> It seems that one option when using Tephra could be through the 
> CFG_DATA_TX_ZOOKEEPER_QUORUM
> property:
>
> https://github.com/apache/incubator-tephra/blob/d0a1c4c295fd28e68223db220b13dc1b12b326da/tephra-core/src/main/java/org/apache/tephra/TxConstants.java#L224-L226
>
> I've tested this with a couple of Tephra manager processes on different
> hosts, and they do seem to pass off control as the leader/standby
> instance.  It's not clear to me though how "in-flight" transactions that
> have been initiated but not committed yet would be handled during a
> failover?
>
> I also see that there has been recent integration work with Apache Omid as
> an alternative transaction manager - is it expected that Omid will (or
> maybe does already) provide high-availability for transactions?
>
> Thanks!
> Curtis
>
>
>


Re: Python phoenixdb adapter and JSON serialization on PQS

2018-11-05 Thread Josh Elser
Is the OOME issue regardless of using the Java client (sqlline-thin) and 
the Python client? I would like to know more about this one. If you can 
share something that reproduces the problem for you, I'd like to look 
into it. The only suggestion I have at this point in time is to make 
sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via 
PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right 
directory when you launch PQS.


Regarding performance, as you've described it, it sounds like the Python 
driver is just slower than the Java driver. You are right that the 
operations in PQS should be exactly the same, regardless of the client 
you're using -- that is how this architecture works. Avatica is a wire 
protocol that all clients use to talk to PQS. More digging/information 
you can provide about the exact circumstances (and, again, 
steps/environment to reproduce what you see) would be extremely helpful.


Thanks Manoj.

- Josh

On 11/2/18 7:16 PM, Manoj Ganesan wrote:

Thanks Josh for the response!

I would definitely like to use protobuf serialization, but I'm observing 
performance issues trying to run queries with a large number of results. 
One problem is that I observe PQS runs out of memory, when its trying to 
(what looks like to me) serialize the results in Avatica. The other is 
that the phoenixdb python adapter itself spends a large amount of time 
in the logic 
 
where its converting the protobuf rows to python objects.


Interestingly when we use sqlline-thin.py instead of python phoenixdb, 
the protobuf serialization works fine and responses are fast. It's not 
clear to me why PQS would have problems when using the python adapter 
and not when using sqlline-thin, do they follow different code paths 
(especially around serialization)?


Thanks again,
Manoj

On Fri, Nov 2, 2018 at 4:05 PM Josh Elser > wrote:


I would strongly suggest you do not use the JSON serialization.

The JSON support is implemented via Jackson which has no means to make
backwards compatibility "easy". On the contrast, protobuf makes this
extremely easy and we have multiple examples over the past years where
we've been able to fix bugs in a backwards compatible manner.

If you want the thin client to continue to work across versions, stick
with protobuf.

On 11/2/18 5:27 PM, Manoj Ganesan wrote:
 > Hey everyone,
 >
 > I'm trying to use the Python phoenixdb adapter work with JSON
 > serialization on PQS.
 >
 > I'm using Phoenix 4.14 and the adapter works fine with protobuf, but
 > when I try making it work with an older version of phoenixdb
(before the
 > JSON to protobuf switch was introduced), it just returns 0 rows.
I don't
 > see anything in particular wrong with the HTTP requests itself,
and they
 > seem to conform to the Avatica JSON spec
 > (http://calcite.apache.org/avatica/docs/json_reference.html).
 >
 > Here's the result (with some debug statements) that returns 0 rows.
 > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
 >
 > request body =  {"maxRowCount": -2, "connectionId":
 > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
"prepareAndExecute",
 > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
 > request headers =  {'content-type': 'application/json'}
 > _post_request: got response {'fp':  0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
 > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
'version': 11,
 > 'debuglevel': 0, 'msg':  0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
 > response.read(): body =
 >

{"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
 >

":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":

Re: ABORTING region server and following HBase cluster "crash"

2018-11-05 Thread Josh Elser
Thanks, Neelesh. It came off to me like "Phoenix is no good, Cassandra 
has something that works better".


I appreciate you taking the time to clarify! That really means a lot.

On 11/2/18 8:14 PM, Neelesh wrote:
By no means am I judging Phoenix based on this. This is simply a design 
trade-off (scylladb goes the same route and builds global indexes). I 
appreciate all the effort that has gone in to Phoenix, and it was indeed 
a life saver. But the technical point remains that single node failures 
have potential to cascade to the entire cluster. That's the nature of 
global indexes, not specific to phoenix.


I apologize if my response came off as dismissing phoenix altogether. 
FWIW, I'm a big advocate of phoenix at my org internally, albeit for the 
newer version.



On Fri, Nov 2, 2018, 4:09 PM Josh Elser > wrote:


I would strongly disagree with the assertion that this is some
unavoidable problem. Yes, an inverted index is a data structure which,
by design, creates a hotspot (phrased another way, this is "data
locality").

Lots of extremely smart individuals have spent a significant amount of
time and effort in stabilizing secondary indexes in the past 1-2 years,
not to mention others spending time on a local index implementation.
Judging Phoenix in its entirety based off of an arbitrarily old version
of Phoenix is disingenuous.

On 11/2/18 2:00 PM, Neelesh wrote:
 > I think this is an unavoidable problem in some sense, if global
indexes
 > are used. Essentially global indexes create a  graph of dependent
region
 > servers due to index rpc calls from one RS to another. Any single
 > failure is bound to affect the entire graph, which under
reasonable load
 > becomes the entire HBase cluster. We had to drop global indexes
just to
 > keep the cluster running for more than a few days.
 >
 > I think Cassandra has local secondary indexes preciesly because
of this
 > issue. Last I checked there were significant pending improvements
 > required for Phoenix local indexes, especially around read paths
( not
 > utilizing primary key prefixes in secondary index reads where
possible,
 > for example)
 >
 >
 > On Thu, Sep 13, 2018, 8:12 PM Jonathan Leech mailto:jonat...@gmail.com>
 > >> wrote:
 >
 >     This seems similar to a failure scenario I’ve seen a couple
times. I
 >     believe after multiple restarts you got lucky and tables were
 >     brought up by Hbase in the correct order.
 >
 >     What happens is some kind of semi-catastrophic failure where 1 or
 >     more region servers go down with edits that weren’t flushed,
and are
 >     only in the WAL. These edits belong to regions whose tables have
 >     secondary indexes. Hbase wants to replay the WAL before
bringing up
 >     the region server. Phoenix wants to talk to the index region
during
 >     this, but can’t. It fails enough times then stops.
 >
 >     The more region servers / tables / indexes affected, the more
likely
 >     that a full restart will get stuck in a classic deadlock. A good
 >     old-fashioned data center outage is a great way to get
started with
 >     this kind of problem. You might make some progress and get stuck
 >     again, or restart number N might get those index regions
initialized
 >     before the main table.
 >
 >     The sure fire way to recover a cluster in this condition is to
 >     strategically disable all the tables that are failing to come up.
 >     You can do this from the Hbase shell as long as the master is
 >     running. If I remember right, it’s a pain since the disable
command
 >     will hang. You might need to disable a table, kill the shell,
 >     disable the next table, etc. Then restart. You’ll eventually
have a
 >     cluster with all the region servers finally started, and a
bunch of
 >     disabled regions. If you disabled index tables, enable one,
wait for
 >     it to become available; eg its WAL edits will be replayed, then
 >     enable the associated main table and wait for it to come
online. If
 >     Hbase did it’s job without error, and your failure didn’t include
 >     losing 4 disks at once, order will be restored. Lather, rinse,
 >     repeat until everything is enabled and online.
 >
 >      A big enough failure sprinkled with a little bit of
bad luck
 >     and what seems to be a Phoenix flaw == deadlock trying to get
HBASE
 >     to start up. Fix by forcing the order that Hbase brings regions
 >     online. Finally, never go full restart. 
 >
 >      > On Sep 10, 2018, at 7:30 PM, Batyrshin Alexander
 >     <0x62...@gmail.com 
<

High-availability for transactions

2018-11-05 Thread Curtis Howard
Hi,

Is there a best approach to ensuring high-availability for transactions?
It seems that one option when using Tephra could be through the
CFG_DATA_TX_ZOOKEEPER_QUORUM
property:
https://github.com/apache/incubator-tephra/blob/d0a1c4c295fd28e68223db220b13dc1b12b326da/tephra-core/src/main/java/org/apache/tephra/TxConstants.java#L224-L226

I've tested this with a couple of Tephra manager processes on different
hosts, and they do seem to pass off control as the leader/standby
instance.  It's not clear to me though how "in-flight" transactions that
have been initiated but not committed yet would be handled during a
failover?

I also see that there has been recent integration work with Apache Omid as
an alternative transaction manager - is it expected that Omid will (or
maybe does already) provide high-availability for transactions?

Thanks!
Curtis