Re: When will Spark SQL support building DB index natively?

2014-12-18 Thread Michael Armbrust
It is implemented in the same way as Hive and interoperates with the hive
metastore.  In 1.2 we are considering adding partitioning to the SparkSQL
data source API as well..  However, for now, you should create a hive
context and a partitioned table.  Spark SQL will automatically select
partitions when there are predicates in a query against the partitioning
columns.

On Wed, Dec 17, 2014 at 7:31 PM, Xuelin Cao xuelin...@yahoo.com wrote:

 Thanks, I didn't try the partitioned table support (sounds like a hive
 feature)

 Is there any guideline? Should I use hiveContext to create the table with
 partition firstly?


   On Thursday, December 18, 2014 2:28 AM, Michael Armbrust 
 mich...@databricks.com wrote:


 - Dev list

 Have you looked at partitioned table support?  That would only scan data
 where the predicate matches the partition.  Depending on the cardinality of
 the customerId column that could be a good option for you.

 On Wed, Dec 17, 2014 at 2:25 AM, Xuelin Cao xuelin...@yahoo.com.invalid
 wrote:


 Hi,
  In Spark SQL help document, it says Some of these (such as indexes)
 are less important due to Spark SQL’s in-memory  computational model.
 Others are slotted for future releases of Spark SQL.
- Block level bitmap indexes and virtual columns (used to build
 indexes)

  For our use cases, DB index is quite important. I have about 300G
 data in our database, and we always use customer id as a predicate for DB
 look up.  Without DB index, we will have to scan all 300G data, and it will
 take  1 minute for a simple DB look up, while MySQL only takes 10 seconds.
 We tried to create an independent table for each customer id, the result
 is pretty good, but the logic will be very complex.
  I'm wondering when will Spark SQL supports DB index, and before that,
 is there an alternative way to support DB index function?
 Thanks






When will Spark SQL support building DB index natively?

2014-12-17 Thread Xuelin Cao

Hi, 
     In Spark SQL help document, it says Some of these (such as indexes) are 
less important due to Spark SQL’s in-memory  computational model. Others are 
slotted for future releases of Spark SQL.   
   - Block level bitmap indexes and virtual columns (used to build indexes)

     For our use cases, DB index is quite important. I have about 300G data in 
our database, and we always use customer id as a predicate for DB look up.  
Without DB index, we will have to scan all 300G data, and it will take  1 
minute for a simple DB look up, while MySQL only takes 10 seconds. We tried to 
create an independent table for each customer id, the result is pretty good, 
but the logic will be very complex. 
     I'm wondering when will Spark SQL supports DB index, and before that, is 
there an alternative way to support DB index function?
Thanks


Re: When will Spark SQL support building DB index natively?

2014-12-17 Thread Michael Armbrust
- Dev list

Have you looked at partitioned table support?  That would only scan data
where the predicate matches the partition.  Depending on the cardinality of
the customerId column that could be a good option for you.

On Wed, Dec 17, 2014 at 2:25 AM, Xuelin Cao xuelin...@yahoo.com.invalid
wrote:


 Hi,
  In Spark SQL help document, it says Some of these (such as indexes)
 are less important due to Spark SQL’s in-memory  computational model.
 Others are slotted for future releases of Spark SQL.
- Block level bitmap indexes and virtual columns (used to build
 indexes)

  For our use cases, DB index is quite important. I have about 300G
 data in our database, and we always use customer id as a predicate for DB
 look up.  Without DB index, we will have to scan all 300G data, and it will
 take  1 minute for a simple DB look up, while MySQL only takes 10 seconds.
 We tried to create an independent table for each customer id, the result
 is pretty good, but the logic will be very complex.
  I'm wondering when will Spark SQL supports DB index, and before that,
 is there an alternative way to support DB index function?
 Thanks



Re: When will Spark SQL support building DB index natively?

2014-12-17 Thread Xuelin Cao
 Thanks, I didn't try the partitioned table support (sounds like a hive feature)
Is there any guideline? Should I use hiveContext to create the table with 
partition firstly? 


 On Thursday, December 18, 2014 2:28 AM, Michael Armbrust 
mich...@databricks.com wrote:
   

 - Dev list
Have you looked at partitioned table support?  That would only scan data where 
the predicate matches the partition.  Depending on the cardinality of the 
customerId column that could be a good option for you.
On Wed, Dec 17, 2014 at 2:25 AM, Xuelin Cao xuelin...@yahoo.com.invalid wrote:

Hi, 
     In Spark SQL help document, it says Some of these (such as indexes) are 
less important due to Spark SQL’s in-memory  computational model. Others are 
slotted for future releases of Spark SQL.
   - Block level bitmap indexes and virtual columns (used to build indexes)

     For our use cases, DB index is quite important. I have about 300G data in 
our database, and we always use customer id as a predicate for DB look up.  
Without DB index, we will have to scan all 300G data, and it will take  1 
minute for a simple DB look up, while MySQL only takes 10 seconds. We tried to 
create an independent table for each customer id, the result is pretty good, 
but the logic will be very complex. 
     I'm wondering when will Spark SQL supports DB index, and before that, is 
there an alternative way to support DB index function?
Thanks