Re: Feature request: split dataset based on condition

2019-02-02 Thread Sean Owen
I think the problem is that can't produce multiple Datasets from one source
in one operation - consider that reproducing one of them would mean
reproducing all of them. You can write a method that would do the filtering
multiple times but it wouldn't be faster. What do you have in mind that's
different?

On Sat, Feb 2, 2019 at 12:19 AM Moein Hosseini  wrote:

> I've seen many application need to split dataset to multiple datasets
> based on some conditions. As there is no method to do it in one place,
> developers use *filter *method multiple times. I think it can be useful
> to have method to split dataset based on condition in one iteration,
> something like *partition* method of scala (of-course scala partition
> just split list into two list, but something more general can be more
> useful).
> If you think it can be helpful, I can create Jira issue and work on it to
> send PR.
>
> Best Regards
> Moein
>
> --
>
> Moein Hosseini
> Data Engineer
> mobile: +98 912 468 1859 <+98+912+468+1859>
> site: www.moein.xyz
> email: moein...@gmail.com
> [image: linkedin] 
> [image: twitter] 
>
>


Re: Feature request: split dataset based on condition

2019-02-02 Thread Moein Hosseini
I don't consider it as method to apply filtering multiple time, instead use
it as semi-action not just transformation. Let's think that we have
something like map-partition which accept multiple lambda that each one
collect their ROW for their dataset (or something like it). Is it possible?

On Sat, Feb 2, 2019 at 5:59 PM Sean Owen  wrote:

> I think the problem is that can't produce multiple Datasets from one
> source in one operation - consider that reproducing one of them would mean
> reproducing all of them. You can write a method that would do the filtering
> multiple times but it wouldn't be faster. What do you have in mind that's
> different?
>
> On Sat, Feb 2, 2019 at 12:19 AM Moein Hosseini  wrote:
>
>> I've seen many application need to split dataset to multiple datasets
>> based on some conditions. As there is no method to do it in one place,
>> developers use *filter *method multiple times. I think it can be useful
>> to have method to split dataset based on condition in one iteration,
>> something like *partition* method of scala (of-course scala partition
>> just split list into two list, but something more general can be more
>> useful).
>> If you think it can be helpful, I can create Jira issue and work on it to
>> send PR.
>>
>> Best Regards
>> Moein
>>
>> --
>>
>> Moein Hosseini
>> Data Engineer
>> mobile: +98 912 468 1859 <+98+912+468+1859>
>> site: www.moein.xyz
>> email: moein...@gmail.com
>> [image: linkedin] 
>> [image: twitter] 
>>
>>

-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] 
[image: twitter]