Hi Alaa Ali,
That's right, when using the PhoenixInputFormat, you can do simple 'WHERE'
clauses and then perform any aggregate functions you'd like from within
Spark. Any aggregations you run won't be quite as fast as running the
native Spark queries, but once it's available as an RDD you can
Thanks Alex! I'm actually working with views from HBase because I will
never edit the HBase table from Phoenix and I'd hate to accidentally drop
it. I'll have to work out how to create the view with the additional ID
column.
Regards,
Alaa Ali
On Fri, Nov 21, 2014 at 5:26 PM, Alex Kamil
I want to run queries on Apache Phoenix which has a JDBC driver. The query
that I want to run is:
select ts,ename from random_data_date limit 10
But I'm having issues with the JdbcRDD upper and lowerBound parameters
(that I don't actually understand).
Here's what I have so far:
import
Hi Alaa Ali,
In order for Spark to split the JDBC query in parallel, it expects an upper
and lower bound for your input data, as well as a number of partitions so
that it can split the query across multiple tasks.
For example, depending on your data distribution, you could set an upper
and lower
Awesome, thanks Josh, I missed that previous post of yours! But your code
snippet shows a select statement, so what I can do is just run a simple
select with a where clause if I want to, and then run my data processing on
the RDD to mimic the aggregation I want to do with SQL, right? Also,
another
Ali,
just create a BIGINT column with numeric values in phoenix and use sequences
http://phoenix.apache.org/sequences.html to populate it automatically
I included the setup below in case someone starts from scratch
Prerequisites:
- export JAVA_HOME, SCALA_HOME and install sbt
- install hbase in