Re: Flink SQL and data shuffling (keyBy)

2022-04-06 Thread Marios Trivyzas
Happy to help,

Let us know if it helped in your use case.

On Tue, Apr 5, 2022 at 1:34 AM Yaroslav Tkachenko 
wrote:

> Hi Marios,
>
> Thank you, this looks very promising!
>
> On Mon, Apr 4, 2022 at 2:42 AM Marios Trivyzas  wrote:
>
>> Hi again,
>>
>> Maybe you can use the
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-sink-keyed-shuffle
>> *table.exec.sink.keyed-shuffle* and set it to *FORCE, *which will use
>> the primary key column(s) to partition and distribute the data.
>>
>> On Fri, Apr 1, 2022 at 6:52 PM Marios Trivyzas  wrote:
>>
>>> Hi!
>>>
>>> I don't think there is a way to achieve that without resorting to
>>> DataStream API.
>>> I don't know if using the PARTITIONED BY clause in the create statement
>>> of the table can help to "balance" the data, see
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#partitioned-by
>>> .
>>>
>>>
>>> On Thu, Mar 31, 2022 at 7:18 AM Yaroslav Tkachenko 
>>> wrote:
>>>
 Hey everyone,

 I'm trying to use Flink SQL to construct a set of transformations for
 my application. Let's say the topology just has three steps:

 - SQL Source
 - SQL SELECT statement
 - SQL Sink (via INSERT)

 The sink I'm using (JDBC) would really benefit from data partitioning
 (by PK ID) to avoid conflicting transactions and deadlocks. I can force
 Flink to partition the data by the PK ID before the INSERT by resorting to
 DataStream API and leveraging the keyBy method, then transforming
 DataStream back to the Table again...

 Is there a simpler way to do this? I understand that, for example, a
 GROUP BY statement will probably perform similar data shuffling, but what
 if I have a simple SELECT followed by INSERT?

 Thank you!

>>>
>>>
>>> --
>>> Marios
>>>
>>
>>
>> Best,
>> Marios
>>
>

-- 
Marios


Re: Flink SQL and data shuffling (keyBy)

2022-04-04 Thread Yaroslav Tkachenko
Hi Marios,

Thank you, this looks very promising!

On Mon, Apr 4, 2022 at 2:42 AM Marios Trivyzas  wrote:

> Hi again,
>
> Maybe you can use the
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-sink-keyed-shuffle
> *table.exec.sink.keyed-shuffle* and set it to *FORCE, *which will use the
> primary key column(s) to partition and distribute the data.
>
> On Fri, Apr 1, 2022 at 6:52 PM Marios Trivyzas  wrote:
>
>> Hi!
>>
>> I don't think there is a way to achieve that without resorting to
>> DataStream API.
>> I don't know if using the PARTITIONED BY clause in the create statement
>> of the table can help to "balance" the data, see
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#partitioned-by
>> .
>>
>>
>> On Thu, Mar 31, 2022 at 7:18 AM Yaroslav Tkachenko 
>> wrote:
>>
>>> Hey everyone,
>>>
>>> I'm trying to use Flink SQL to construct a set of transformations for my
>>> application. Let's say the topology just has three steps:
>>>
>>> - SQL Source
>>> - SQL SELECT statement
>>> - SQL Sink (via INSERT)
>>>
>>> The sink I'm using (JDBC) would really benefit from data partitioning
>>> (by PK ID) to avoid conflicting transactions and deadlocks. I can force
>>> Flink to partition the data by the PK ID before the INSERT by resorting to
>>> DataStream API and leveraging the keyBy method, then transforming
>>> DataStream back to the Table again...
>>>
>>> Is there a simpler way to do this? I understand that, for example, a
>>> GROUP BY statement will probably perform similar data shuffling, but what
>>> if I have a simple SELECT followed by INSERT?
>>>
>>> Thank you!
>>>
>>
>>
>> --
>> Marios
>>
>
>
> Best,
> Marios
>


Re: Flink SQL and data shuffling (keyBy)

2022-04-04 Thread Marios Trivyzas
Hi again,

Maybe you can use the
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-sink-keyed-shuffle
*table.exec.sink.keyed-shuffle* and set it to *FORCE, *which will use the
primary key column(s) to partition and distribute the data.

On Fri, Apr 1, 2022 at 6:52 PM Marios Trivyzas  wrote:

> Hi!
>
> I don't think there is a way to achieve that without resorting to
> DataStream API.
> I don't know if using the PARTITIONED BY clause in the create statement of
> the table can help to "balance" the data, see
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#partitioned-by
> .
>
>
> On Thu, Mar 31, 2022 at 7:18 AM Yaroslav Tkachenko 
> wrote:
>
>> Hey everyone,
>>
>> I'm trying to use Flink SQL to construct a set of transformations for my
>> application. Let's say the topology just has three steps:
>>
>> - SQL Source
>> - SQL SELECT statement
>> - SQL Sink (via INSERT)
>>
>> The sink I'm using (JDBC) would really benefit from data partitioning (by
>> PK ID) to avoid conflicting transactions and deadlocks. I can force Flink
>> to partition the data by the PK ID before the INSERT by resorting to
>> DataStream API and leveraging the keyBy method, then transforming
>> DataStream back to the Table again...
>>
>> Is there a simpler way to do this? I understand that, for example, a
>> GROUP BY statement will probably perform similar data shuffling, but what
>> if I have a simple SELECT followed by INSERT?
>>
>> Thank you!
>>
>
>
> --
> Marios
>


Best,
Marios


Re: Flink SQL and data shuffling (keyBy)

2022-04-01 Thread Marios Trivyzas
Hi!

I don't think there is a way to achieve that without resorting to
DataStream API.
I don't know if using the PARTITIONED BY clause in the create statement of
the table can help to "balance" the data, see
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#partitioned-by
.


On Thu, Mar 31, 2022 at 7:18 AM Yaroslav Tkachenko 
wrote:

> Hey everyone,
>
> I'm trying to use Flink SQL to construct a set of transformations for my
> application. Let's say the topology just has three steps:
>
> - SQL Source
> - SQL SELECT statement
> - SQL Sink (via INSERT)
>
> The sink I'm using (JDBC) would really benefit from data partitioning (by
> PK ID) to avoid conflicting transactions and deadlocks. I can force Flink
> to partition the data by the PK ID before the INSERT by resorting to
> DataStream API and leveraging the keyBy method, then transforming
> DataStream back to the Table again...
>
> Is there a simpler way to do this? I understand that, for example, a GROUP
> BY statement will probably perform similar data shuffling, but what if I
> have a simple SELECT followed by INSERT?
>
> Thank you!
>


-- 
Marios


Flink SQL and data shuffling (keyBy)

2022-03-30 Thread Yaroslav Tkachenko
Hey everyone,

I'm trying to use Flink SQL to construct a set of transformations for my
application. Let's say the topology just has three steps:

- SQL Source
- SQL SELECT statement
- SQL Sink (via INSERT)

The sink I'm using (JDBC) would really benefit from data partitioning (by
PK ID) to avoid conflicting transactions and deadlocks. I can force Flink
to partition the data by the PK ID before the INSERT by resorting to
DataStream API and leveraging the keyBy method, then transforming
DataStream back to the Table again...

Is there a simpler way to do this? I understand that, for example, a GROUP
BY statement will probably perform similar data shuffling, but what if I
have a simple SELECT followed by INSERT?

Thank you!