Re: Spark SQL with Apache Phoenix lower and upper Bound

2014-11-24 Thread Josh Mahonin
Hi Alaa Ali, That's right, when using the PhoenixInputFormat, you can do simple 'WHERE' clauses and then perform any aggregate functions you'd like from within Spark. Any aggregations you run won't be quite as fast as running the native Spark queries, but once it's available as an RDD you can

Re: Spark SQL with Apache Phoenix lower and upper Bound

2014-11-22 Thread Alaa Ali
Thanks Alex! I'm actually working with views from HBase because I will never edit the HBase table from Phoenix and I'd hate to accidentally drop it. I'll have to work out how to create the view with the additional ID column. Regards, Alaa Ali On Fri, Nov 21, 2014 at 5:26 PM, Alex Kamil

Spark SQL with Apache Phoenix lower and upper Bound

2014-11-21 Thread Alaa Ali
I want to run queries on Apache Phoenix which has a JDBC driver. The query that I want to run is: select ts,ename from random_data_date limit 10 But I'm having issues with the JdbcRDD upper and lowerBound parameters (that I don't actually understand). Here's what I have so far: import

Re: Spark SQL with Apache Phoenix lower and upper Bound

2014-11-21 Thread Josh Mahonin
Hi Alaa Ali, In order for Spark to split the JDBC query in parallel, it expects an upper and lower bound for your input data, as well as a number of partitions so that it can split the query across multiple tasks. For example, depending on your data distribution, you could set an upper and lower

Re: Spark SQL with Apache Phoenix lower and upper Bound

2014-11-21 Thread Alaa Ali
Awesome, thanks Josh, I missed that previous post of yours! But your code snippet shows a select statement, so what I can do is just run a simple select with a where clause if I want to, and then run my data processing on the RDD to mimic the aggregation I want to do with SQL, right? Also, another

Re: Spark SQL with Apache Phoenix lower and upper Bound

2014-11-21 Thread Alex Kamil
Ali, just create a BIGINT column with numeric values in phoenix and use sequences http://phoenix.apache.org/sequences.html to populate it automatically I included the setup below in case someone starts from scratch Prerequisites: - export JAVA_HOME, SCALA_HOME and install sbt - install hbase in