Re: Architectural Understanding

2016-09-22 Thread James Taylor
On Thu, Sep 22, 2016 at 12:47 PM, John Leach  wrote:

> Can you validate my understanding?
>
> 1.  Importing Data: Online load via python and offline load via MapReduce.
>

There are many ways to import data. Since Phoenix stays true to the basic
HBase data model, you can import data in any way that HBase supports,
independent of Phoenix APIs, as long as you define your Phoenix schema to
be compatible with the serialization format you used for your cell values
and row key. Many users operate in this way. For example, there's a Storm
bolt that imports data directly.

Specifically using Phoenix-backed APIs, you can import using:
- CSV bulk import (MR-based)
- Hive (using our Phoenix Hive Storage Handler)
- Pig scripts (using our Phoenix StoreFunc)
- MR directly (using our Phoenix RecordWriter)
- Flume (using our Phoenix Flume integration)


> 2.  Transactional System: Tephra (Centralized Transactional System based
> on Yanoo’s Omid)
>

Yes. Stay tuned - we may have an Apache Omid one in the future too.


> 3.  Analytical Engine: HBase Coprocessors and JDBC Server/Client (i.e.
> where do you do aggregations and handle intermediate results)
>

We also have Spark, Hive, and Pig integration for analytics.


> 4.  Yarn Support: No except for MapReduce and Index Creation Bits
>

See above - many of those APIs tie into YARN. The standard queries you do
in our JDBC driver do not, though.


> 5.  Resource Management: ?  Thread pool w/ first in first out?
>

We have some configurations that drive resource management, most around
memory usage. For example, you can restrict a tenant to use a percentage of
available memory on the server-side. Outside of that we rely on HBase to do
resource management to a large extent. For example, a schema in Phoenix
maps to an namespace in HBase which allows various ways of doing resource
management.


>
> Is this accurate?
>
> Regards,
> John Leach


Architectural Understanding

2016-09-22 Thread John Leach
Can you validate my understanding?

1.  Importing Data: Online load via python and offline load via MapReduce.
2.  Transactional System: Tephra (Centralized Transactional System based on 
Yanoo’s Omid)
3.  Analytical Engine: HBase Coprocessors and JDBC Server/Client (i.e. where do 
you do aggregations and handle intermediate results)
4.  Yarn Support: No except for MapReduce and Index Creation Bits
5.  Resource Management: ?  Thread pool w/ first in first out?

Is this accurate?

Regards,
John Leach