Re: Custom Partitioning in Catalyst

Reynold Xin Fri, 16 Jun 2017 12:04:49 -0700

Seems like a great idea to do?


On Fri, Jun 16, 2017 at 12:03 PM, Russell Spitzer <russell.spit...@gmail.com
> wrote:

> I considered adding this to DataSource APIV2 ticket but I didn't want to
> be first :P Do you think there will be any issues with opening up the
> partitioning as well?
>
> On Fri, Jun 16, 2017 at 11:58 AM Reynold Xin <r...@databricks.com> wrote:
>
>> Perhaps we should extend the data source API to support that.
>>
>>
>> On Fri, Jun 16, 2017 at 11:37 AM, Russell Spitzer <
>> russell.spit...@gmail.com> wrote:
>>
>>> I've been trying to work with making Catalyst Cassandra partitioning
>>> aware. There seem to be two major blocks on this.
>>>
>>> The first is that DataSourceScanExec is unable to learn what the
>>> underlying partitioning should be from the BaseRelation it comes from. I'm
>>> currently able to get around this by using the DataSourceStrategy plan and
>>> then transforming the resultant DataSourceScanExec.
>>>
>>> The second is that the Partitioning trait is sealed. I want to define a
>>> new partitioning which is Clustered but is not hashed based on certain
>>> columns. It would look almost identical to the HashPartitioning class
>>> except the
>>> expression which returns a valid PartitionID given expressions would be
>>> different.
>>>
>>> Anyone have any ideas on how to get around the second issue? Would it be
>>> worth while to make changes to allow BaseRelations to advertise a
>>> particular Partitioner?
>>>
>>
>>

Re: Custom Partitioning in Catalyst

Reply via email to