Re:Re: Re: Some question with Flink state
我不确定但大概率是两次keyby只以后面那个为准,所以可能会导致你前面的keyby其实是无用的(可以试验下)。可以按你说的方式将数据中这两个key拼成一个string当作shuffle的key。 在 2022-05-24 21:06:58,"lxk7...@163.com" 写道: >如果是两次keyby的问题,我可以直接在一次keyby里将两个数据给拼接成字符串,这样的方式是跟两次keyby效果一样吗? > > > >lxk7...@163.com > >From: Xuyang >Date: 2022-05-24 20:51 >To: user-zh >Subject: Re:Re: Re: Some question with Flink state >看起来你keyby了两次,可以自定义一个keyselector来替代这两个。另外如果担心相同key没有被分到同一个并行度时,可以在某个并行度的算子下将数据和该subtask的并行度index打出来,debug调查下 >在 2022-05-24 20:43:19,"lxk7...@163.com" 写道: >> >>https://s2.loli.net/2022/05/24/SgAWefJpaxtOH5l.png >>https://s2.loli.net/2022/05/24/54dZkr19QCh3Djf.png >> >>这样呢 >> >> >>lxk7...@163.com >> >>From: Xuyang >>Date: 2022-05-24 20:17 >>To: user-zh >>Subject: Re:Re: Re: Some question with Flink state >>Hi, 你的图还是挂了,可以使用图床工具试一下 >> >> >> >>在 2022-05-24 13:50:34,"lxk7...@163.com" 写道: >> >>图片好像有点问题,重新上传一下 >>lxk7...@163.com >>From: Hangxiang Yu >>Date: 2022-05-24 12:09 >>To: user-zh >>Subject: Re: Re: Some question with Flink state >>你是用data stream作业吗,相同key如果分配到了不同的并行度,有可能是和你写的key >>selector相关(你可以参照下KeySelector的comments去看是否符合它的规范); >>或者方便的话你可以分享下你的key selector相关的逻辑和使用state的逻辑; >>On Tue, May 24, 2022 at 9:59 AM lxk7...@163.com wrote: >>> 好的,我看这里面邮件都是英文,所以用英文问了个问题。 >>> >>> 我再描述一下我的问题,我使用键控状态,用的value-state。按理来说,相同的key应该会被分到同一个并行度处理。但是但我使用多个并行度的时候,我发现好像相同的key并没有分配到同一个并行度处理。具体现象在于,我的程序是对同一个用户点击的商品进行累加,在数据里这个商品已经是第二个了,但是程序里这个商品的状态是空,所以导致最终累加的结果是1,而正确结果应该是2。所以我猜测是不是算子对于value-state都是独有的。 >>> >>> 但是当我使用mapstate的时候,这个问题好像就没有再出现了。所以我想了解这里面的原因是啥?或者有什么方法能确保同一个key的数据都会被同一个task处理。 >>> >>> >>> >>> lxk7...@163.com >>> >>> From: Hangxiang Yu >>> Date: 2022-05-23 23:09 >>> To: user-zh; lxk7491 >>> Subject: Re: Some question with Flink state >>> Hello, >>> All states will not be shared in different parallelisms. >>> BTW, English questions could be sent to u...@flink.apache.org. >>> >>> Best, >>> Hangxiang. >>> >>> On Mon, May 23, 2022 at 4:03 PM lxk7...@163.com wrote: >>> >>> > >>> > Hi everyone >>> >I was used Flink keyed-state in my Project.But I found some questions >>> > that make me confused. >>> >when I used value-state in multi parallelism the value is not I >>> wanted. >>> >So I guess that value-state is in every parallelism. every parallelism >>> > saved their only value which means the value is Thread-Level >>> >But when I used map-state,the value is correctly. I mean the map-state >>> > was shared by every parallelism. >>> > looking forward to your reply >>> > >>> > >>> > lxk7...@163.com >>> > >>>
Re:Re: Re: Some question with Flink state
看起来你keyby了两次,可以自定义一个keyselector来替代这两个。另外如果担心相同key没有被分到同一个并行度时,可以在某个并行度的算子下将数据和该subtask的并行度index打出来,debug调查下 在 2022-05-24 20:43:19,"lxk7...@163.com" 写道: > >https://s2.loli.net/2022/05/24/SgAWefJpaxtOH5l.png >https://s2.loli.net/2022/05/24/54dZkr19QCh3Djf.png > >这样呢 > > >lxk7...@163.com > >From: Xuyang >Date: 2022-05-24 20:17 >To: user-zh >Subject: Re:Re: Re: Some question with Flink state >Hi, 你的图还是挂了,可以使用图床工具试一下 > > > >在 2022-05-24 13:50:34,"lxk7...@163.com" 写道: > >图片好像有点问题,重新上传一下 >lxk7...@163.com >From: Hangxiang Yu >Date: 2022-05-24 12:09 >To: user-zh >Subject: Re: Re: Some question with Flink state >你是用data stream作业吗,相同key如果分配到了不同的并行度,有可能是和你写的key >selector相关(你可以参照下KeySelector的comments去看是否符合它的规范); >或者方便的话你可以分享下你的key selector相关的逻辑和使用state的逻辑; >On Tue, May 24, 2022 at 9:59 AM lxk7...@163.com wrote: >> 好的,我看这里面邮件都是英文,所以用英文问了个问题。 >> >> 我再描述一下我的问题,我使用键控状态,用的value-state。按理来说,相同的key应该会被分到同一个并行度处理。但是但我使用多个并行度的时候,我发现好像相同的key并没有分配到同一个并行度处理。具体现象在于,我的程序是对同一个用户点击的商品进行累加,在数据里这个商品已经是第二个了,但是程序里这个商品的状态是空,所以导致最终累加的结果是1,而正确结果应该是2。所以我猜测是不是算子对于value-state都是独有的。 >> >> 但是当我使用mapstate的时候,这个问题好像就没有再出现了。所以我想了解这里面的原因是啥?或者有什么方法能确保同一个key的数据都会被同一个task处理。 >> >> >> >> lxk7...@163.com >> >> From: Hangxiang Yu >> Date: 2022-05-23 23:09 >> To: user-zh; lxk7491 >> Subject: Re: Some question with Flink state >> Hello, >> All states will not be shared in different parallelisms. >> BTW, English questions could be sent to u...@flink.apache.org. >> >> Best, >> Hangxiang. >> >> On Mon, May 23, 2022 at 4:03 PM lxk7...@163.com wrote: >> >> > >> > Hi everyone >> >I was used Flink keyed-state in my Project.But I found some questions >> > that make me confused. >> >when I used value-state in multi parallelism the value is not I >> wanted. >> >So I guess that value-state is in every parallelism. every parallelism >> > saved their only value which means the value is Thread-Level >> >But when I used map-state,the value is correctly. I mean the map-state >> > was shared by every parallelism. >> > looking forward to your reply >> > >> > >> > lxk7...@163.com >> > >>
Re:Re: Re: Some question with Flink state
Hi, 你的图还是挂了,可以使用图床工具试一下 在 2022-05-24 13:50:34,"lxk7...@163.com" 写道: 图片好像有点问题,重新上传一下 lxk7...@163.com From: Hangxiang Yu Date: 2022-05-24 12:09 To: user-zh Subject: Re: Re: Some question with Flink state 你是用data stream作业吗,相同key如果分配到了不同的并行度,有可能是和你写的key selector相关(你可以参照下KeySelector的comments去看是否符合它的规范); 或者方便的话你可以分享下你的key selector相关的逻辑和使用state的逻辑; On Tue, May 24, 2022 at 9:59 AM lxk7...@163.com wrote: > 好的,我看这里面邮件都是英文,所以用英文问了个问题。 > > 我再描述一下我的问题,我使用键控状态,用的value-state。按理来说,相同的key应该会被分到同一个并行度处理。但是但我使用多个并行度的时候,我发现好像相同的key并没有分配到同一个并行度处理。具体现象在于,我的程序是对同一个用户点击的商品进行累加,在数据里这个商品已经是第二个了,但是程序里这个商品的状态是空,所以导致最终累加的结果是1,而正确结果应该是2。所以我猜测是不是算子对于value-state都是独有的。 > > 但是当我使用mapstate的时候,这个问题好像就没有再出现了。所以我想了解这里面的原因是啥?或者有什么方法能确保同一个key的数据都会被同一个task处理。 > > > > lxk7...@163.com > > From: Hangxiang Yu > Date: 2022-05-23 23:09 > To: user-zh; lxk7491 > Subject: Re: Some question with Flink state > Hello, > All states will not be shared in different parallelisms. > BTW, English questions could be sent to u...@flink.apache.org. > > Best, > Hangxiang. > > On Mon, May 23, 2022 at 4:03 PM lxk7...@163.com wrote: > > > > > Hi everyone > >I was used Flink keyed-state in my Project.But I found some questions > > that make me confused. > >when I used value-state in multi parallelism the value is not I > wanted. > >So I guess that value-state is in every parallelism. every parallelism > > saved their only value which means the value is Thread-Level > >But when I used map-state,the value is correctly. I mean the map-state > > was shared by every parallelism. > > looking forward to your reply > > > > > > lxk7...@163.com > > >