Seems like a great idea to do?
On Fri, Jun 16, 2017 at 12:03 PM, Russell Spitzer <russell.spit...@gmail.com > wrote: > I considered adding this to DataSource APIV2 ticket but I didn't want to > be first :P Do you think there will be any issues with opening up the > partitioning as well? > > On Fri, Jun 16, 2017 at 11:58 AM Reynold Xin <r...@databricks.com> wrote: > >> Perhaps we should extend the data source API to support that. >> >> >> On Fri, Jun 16, 2017 at 11:37 AM, Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> I've been trying to work with making Catalyst Cassandra partitioning >>> aware. There seem to be two major blocks on this. >>> >>> The first is that DataSourceScanExec is unable to learn what the >>> underlying partitioning should be from the BaseRelation it comes from. I'm >>> currently able to get around this by using the DataSourceStrategy plan and >>> then transforming the resultant DataSourceScanExec. >>> >>> The second is that the Partitioning trait is sealed. I want to define a >>> new partitioning which is Clustered but is not hashed based on certain >>> columns. It would look almost identical to the HashPartitioning class >>> except the >>> expression which returns a valid PartitionID given expressions would be >>> different. >>> >>> Anyone have any ideas on how to get around the second issue? Would it be >>> worth while to make changes to allow BaseRelations to advertise a >>> particular Partitioner? >>> >> >>