- Dev list

Have you looked at partitioned table support?  That would only scan data
where the predicate matches the partition.  Depending on the cardinality of
the customerId column that could be a good option for you.

On Wed, Dec 17, 2014 at 2:25 AM, Xuelin Cao <xuelin...@yahoo.com.invalid>
wrote:
>
>
> Hi,
>      In Spark SQL help document, it says "Some of these (such as indexes)
> are less important due to Spark SQL’s in-memory  computational model.
> Others are slotted for future releases of Spark SQL.
>    - Block level bitmap indexes and virtual columns (used to build
> indexes)"
>
>      For our use cases, DB index is quite important. I have about 300G
> data in our database, and we always use "customer id" as a predicate for DB
> look up.  Without DB index, we will have to scan all 300G data, and it will
> take > 1 minute for a simple DB look up, while MySQL only takes 10 seconds.
> We tried to create an independent table for each "customer id", the result
> is pretty good, but the logic will be very complex.
>      I'm wondering when will Spark SQL supports DB index, and before that,
> is there an alternative way to support DB index function?
> Thanks
>

Reply via email to