[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442
 ] 

Fan weiwen edited comment on FLINK-6966 at 9/30/18 5:04 PM:


this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong, Bsql fetch from Asql state

 

 


was (Author: fanweiwen):
this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442
 ] 

Fan weiwen edited comment on FLINK-6966 at 9/30/18 5:02 PM:


this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 


was (Author: fanweiwen):
this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
xp_sso_biz_source
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
xp_sso_biz_source
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2017-06-22 Thread sunjincheng (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059493#comment-16059493
 ] 

sunjincheng edited comment on FLINK-6966 at 6/22/17 2:53 PM:
-

Hi [~fhueske] That's true. I got it. In our side we already meet that problem. 
we can not  increase the parallelism of an operator if we do not set 
{{maxParallelism}}. 

* About {{maxParallelism}}, it is not easy to set every operator's 
{{maxParallelism}} by SQL/TableAPI. I think we can set global 
{{maxParallelism}} by {{StreamExecutionEnvironment}}. Of course we can define a 
default value for user or give a hint message if the user is not set. 

* About UID, for the first version I think we can just deal with the operators 
which are generated by {{DataStreamRel#translateToPlan}}.

Please let me know what you think?

Hi [~jark] I think {{QueryConfig}} is difficult to set set every operator's 
{{maxParallelism}}, If we want a global value, the 
{{StreamExecutionEnvironment}} is enough. What do you think?

Feel free to correct me If there are any incorrect describe. [~fhueske] [~jark]


was (Author: sunjincheng121):
Hi [~fhueske] That's true. I got it. In our side we already meet that problem. 
we can not  increase the parallelism of an operator if we do not set 
{{maxParallelism}}. 

* About {{maxParallelism}}, it is not easy to set every operator's 
{{maxParallelism}} by SQL/TableAPI. I think we can set global 
{{maxParallelism}} by {{StreamExecutionEnvironment}}. Of course we can define a 
default value for user or give a hint message if the user is not set. 

* About UID, for the first version I think we can just deal with the operators 
which are generated by {{DataStreamRel#translateToPlan}}.

Please let me know what you think?

Hi [~jark] I think {{QueryConfig }} is difficult to set set every operator's 
{{maxParallelism}}, If we want a global value, the 
{{StreamExecutionEnvironment}} is enough. What do you think?

Feel free to correct me If there are any incorrect describe. [~fhueske] [~jark]

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API

2017-06-21 Thread sunjincheng (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057656#comment-16057656
 ] 

sunjincheng edited comment on FLINK-6966 at 6/21/17 3:19 PM:
-

Hi, [~fhueske] do you meant that we need call {{uid/setUidHash}} when we 
translate to DataStream? And we provided unique UID for per transformation and 
job.If so, we need to consider all the Operators which are in the 
{{table.scala}}. right? 



was (Author: sunjincheng121):
Hi, [~fhueske] do you meant that we need call {{uid/setUidHash}} when we 
translate to DataStream? And we provided unique UID for per transformation and 
job. right?


> Add maxParallelism and UIDs to all operators generated by the Table API
> ---
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)