Hi, Rui. Overall LGTM now. Thanks for driving this again! On Wed, Nov 15, 2023 at 2:07 PM Rui Fan <1996fan...@gmail.com> wrote:
> Hi Hangxiang, > > Thanks for your feedback! > > > I saw some users may debug operator correctness using print() or > System.out > > which may bring lots of info. > > It's fine if files are separate, otherwise these will make > taskMnanager.log > > roll quickly. > > But I think it's fine if we don't use it as default first. > > Got it, we will support redirect System.out to LOG, and keep > original behaviour(DEFAULT enum) as default value. > > > > > Do you mean like this: LOG.info("taskName {} : {}", taskName, > > > userLogContext)? > > > > Yes, That's what I mean. > > Sounds make sense, I added a new option to FLIP[1], > it's taskmanager.system-out.log.thread-name.enabled: false. > > Why log the thread name here? > > Flink's RichFunction can get subtaskId, but redirecting > System.out to LOG is public code, so it is difficult to get subtaskId. > After consideration, a workaround solution is to log thread name, > because the thread name of the default Task Thread contains the > task name and subtask id. And this solution can support all threads, > because some non-task threads may also call println. > > [1] https://cwiki.apache.org/confluence/x/4guZE > > Best, > Rui > > On Wed, Nov 15, 2023 at 10:26 AM Hangxiang Yu <master...@gmail.com> wrote: > > > Hi, Rui. > > > > As I mentioned before: the user didn't really want > > > to log into taskmanager.out, it just happened by accident. > > > So, if users change the System.out to LOG.info, it still happen. > > > > Thanks for the feedback. > > I saw some users may debug operator correctness using print() or > System.out > > which may bring lots of info. > > It's fine if files are separate, otherwise these will make > taskMnanager.log > > roll quickly. > > But I think it's fine if we don't use it as default first. > > > > Do you mean like this: LOG.info("taskName {} : {}", taskName, > > > userLogContext)? > > > > > > > Yes, That's what I mean. > > > > On Sat, Nov 11, 2023 at 12:54 AM Piotr Nowojski < > piotr.nowoj...@gmail.com> > > wrote: > > > > > Thanks! :) > > > > > > Best, Piotrek > > > > > > czw., 9 lis 2023 o 16:15 Rui Fan <1996fan...@gmail.com> napisał(a): > > > > > > > Hi Piotr, > > > > > > > > Thanks for your feedback! > > > > > > > > > Or implement your own loop? It shouldn't be more than a couple of > > > lines. > > > > > > > > Implementing it directly is fine, I have updated the FLIP. > > > > And this logic can be found in the `isLineEnded` method. > > > > > > > > Best, > > > > Rui > > > > > > > > On Thu, Nov 9, 2023 at 11:00 PM Piotr Nowojski < > > piotr.nowoj...@gmail.com > > > > > > > > wrote: > > > > > > > > > Hi Rui, > > > > > > > > > > > I see java8 doesn't have `Arrays.equals(int[] a, int aFromIndex, > > int > > > > > > aToIndex, int[] b, int bFromIndex, int bToIndex)`, > > > > > > and java11 has it. Do you have any other suggestions for java8? > > > > > > > > > > Maybe use `ByteBuffer.wrap`? > > > > > > > > > > ByteBuffer.wrap(array, ..., ...).equals(ByteBuffer.wrap(array2, > ..., > > > > ...)) > > > > > > > > > > This shouldn't have overheads as far as I remember. > > > > > > > > > > Or implement your own loop? It shouldn't be more than a couple of > > > lines. > > > > > > > > > > Best, > > > > > Piotrek > > > > > > > > > > czw., 9 lis 2023 o 06:43 Rui Fan <1996fan...@gmail.com> > napisał(a): > > > > > > > > > > > Hi Piotr, Archit, Feng and Hangxiang: > > > > > > > > > > > > Thanks a lot for your feedback! > > > > > > > > > > > > Following is my comment, please correct me if I misunderstood > > > anything! > > > > > > > > > > > > To Piotr: > > > > > > > > > > > > > Is there a reason why you are suggesting to copy out bytes from > > > `buf` > > > > > to > > > > > > `bytes`, > > > > > > > instead of using `Arrays.equals(int[] a, int aFromIndex, int > > > > aToIndex, > > > > > > int[] b, int bFromIndex, int bToIndex)`? > > > > > > > > > > > > I see java8 doesn't have `Arrays.equals(int[] a, int aFromIndex, > > int > > > > > > aToIndex, int[] b, int bFromIndex, int bToIndex)`, > > > > > > and java11 has it. Do you have any other suggestions for java8? > > > > > > > > > > > > Also, this code doesn't run in production. As the comment of > > > > > > System.lineSeparator(): > > > > > > > > > > > > > On UNIX systems, it returns {@code "\n"}; on Microsoft > > > > > > > Windows systems it returns {@code "\r\n"}. > > > > > > > > > > > > So Mac and Linux just return one character, we will compare > > > > > > one byte directly. > > > > > > > > > > > > > > > > > > > > > > > > To Feng: > > > > > > > > > > > > > Will they be written to the taskManager.log file by default > > > > > > > or the taskManager.out file? > > > > > > > > > > > > I prefer LOG as the default value for > taskmanager.system-out.mode. > > > > > > It's useful for job stability and doesn't introduce significant > > > impact > > > > to > > > > > > users. Also, our production has already used this feature for > > > > > > more than 1 years, it works well. > > > > > > > > > > > > However, I write the DEFAULT as the default value for > > > > > > taskmanager.system-out.mode, because when the community > introduces > > > > > > new options, the default value often selects the original > behavior. > > > > > > > > > > > > Looking forward to hear more thoughts from community about this > > > > > > default value. > > > > > > > > > > > > > If we can make taskmanager.out splittable and rolling, would it > > be > > > > > > > easier for users to use this feature? > > > > > > > > > > > > Making taskmanager.out splittable and rolling is a good choice! > > > > > > I have some concerns about it: > > > > > > > > > > > > 1. Users may also want to use LOG.info in their code and just > > > > > > accidentally use System.out.println. It is possible that they > > will > > > > > > also find the logs directly in taskmanager.log. > > > > > > 2. I'm not sure whether the rolling strategy is easy to > implement. > > > > > > If we do it, it's necessary to define a series of flink options > > > > similar > > > > > > to log options, such as: fileMax(how many files should be > > > retained), > > > > > > fileSize(The max size each file), fileNamePatten (The suffix of > > > file > > > > > > name), > > > > > > 3. Check the file size periodically: all logs are written by log > > > > plugin, > > > > > > they can check the log file size after writing. However, > > System.out > > > > > > are written directly. And flink must start a thread to check > the > > > > latest > > > > > > taskmanager.out size periodically. If it's too quick, most of > job > > > > > aren't > > > > > > necessary. If it's too slow, the file size cannot be controlled > > > > > properly. > > > > > > > > > > > > Redirect it to LOG.info may be a reasonable and easy choice. > > > > > > The user didn't really want to log into taskmanager.out, it just > > > > > > happened by accident. > > > > > > > > > > > > > > > > > > > > > > > > To Hangxiang: > > > > > > > > > > > > > 1. I have a similar concern as Feng. Will we redirect to > another > > > log > > > > > file > > > > > > > not taskManager.log ? > > > > > > > > > > > > Please see my last comment, thanks! > > > > > > > > > > > > > taskManager.log contains lots of important information like > init > > > log. > > > > > It > > > > > > > will be rolled quickly if we redirect out and error here. > > > > > > > > > > > > IIUC, this issue isn't caused by System.out, and it can happen if > > > user > > > > > > call a lot of LOG.info. As I mentioned before: the user didn't > > really > > > > > want > > > > > > to log into taskmanager.out, it just happened by accident. > > > > > > So, if users change the System.out to LOG.info, it still happen. > > > > > > > > > > > > > 2. Since we have redirected to LOG mode, Could we also log the > > > > subtask > > > > > > info > > > > > > > ? It may help us to debug granularly. > > > > > > > > > > > > I'm not sure what `log the subtask info` means. Let me confirm > with > > > you > > > > > > first. > > > > > > Do you mean like this: LOG.info("taskName {} : {}", taskName, > > > > > > userLogContext)? > > > > > > > > > > > > Best, > > > > > > Rui > > > > > > > > > > > > On Thu, Nov 9, 2023 at 11:47 AM Hangxiang Yu < > master...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > Hi, Rui. > > > > > > > Thanks for the proposal. It sounds reasonable. > > > > > > > I have some questions, PTAL: > > > > > > > 1. I have a similar concern as Feng. Will we redirect to > another > > > log > > > > > file > > > > > > > not taskManager.log ? > > > > > > > taskManager.log contains lots of important information like > init > > > log. > > > > > It > > > > > > > will be rolled quickly if we redirect out and error here. > > > > > > > 2. Since we have redirected to LOG mode, Could we also log the > > > > subtask > > > > > > info > > > > > > > ? It may help us to debug granularly. > > > > > > > > > > > > > > On Thu, Nov 9, 2023 at 9:47 AM Feng Jin <jinfeng1...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > Hi, Rui. > > > > > > > > > > > > > > > > Thank you for initiating this proposal. > > > > > > > > > > > > > > > > I have a question regarding redirecting stdout and stderr to > > LOG: > > > > > > > > > > > > > > > > Will they be written to the taskManager.log file by default > or > > > the > > > > > > > > taskManager.out file? > > > > > > > > If we can make taskmanager.out splittable and rolling, would > it > > > be > > > > > > easier > > > > > > > > for users to use this feature? > > > > > > > > > > > > > > > > Best, > > > > > > > > Feng > > > > > > > > > > > > > > > > On Thu, Nov 9, 2023 at 3:15 AM Archit Goyal > > > > > > <argo...@linkedin.com.invalid > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Rui, > > > > > > > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > > > > > > > The proposed solution of supporting System out and err to > be > > > > > > redirected > > > > > > > > to > > > > > > > > > LOG or discarded and introducing an enum and two options to > > > > manage > > > > > > > this, > > > > > > > > > seems reasonable. > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Archit Goyal > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Piotr Nowojski <pnowoj...@apache.org> > > > > > > > > > Date: Wednesday, November 8, 2023 at 7:38 AM > > > > > > > > > To: dev@flink.apache.org <dev@flink.apache.org> > > > > > > > > > Subject: Re: [DISCUSS] FLIP-390: Support System out and err > > to > > > be > > > > > > > > > redirected to LOG or discarded > > > > > > > > > Hi Rui, > > > > > > > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > > > > > > > +1 I don't have any major comments :) > > > > > > > > > > > > > > > > > > One nit. In `SystemOutRedirectToLog` in this code: > > > > > > > > > > > > > > > > > > System.arraycopy(buf, count - > > LINE_SEPARATOR_LENGTH, > > > > > > bytes, > > > > > > > 0, > > > > > > > > > LINE_SEPARATOR_LENGTH); > > > > > > > > > return Arrays.equals(LINE_SEPARATOR_BYTES, > bytes) > > > > > > > > > > > > > > > > > > Is there a reason why you are suggesting to copy out bytes > > from > > > > > `buf` > > > > > > > to > > > > > > > > > `bytes`, > > > > > > > > > instead of using `Arrays.equals(int[] a, int aFromIndex, > int > > > > > > aToIndex, > > > > > > > > > int[] b, int bFromIndex, int bToIndex)`? > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > śr., 8 lis 2023 o 11:53 Rui Fan <1996fan...@gmail.com> > > > > napisał(a): > > > > > > > > > > > > > > > > > > > Hi all! > > > > > > > > > > > > > > > > > > > > I would like to start a discussion of FLIP-390: Support > > > System > > > > > out > > > > > > > and > > > > > > > > > err > > > > > > > > > > to be redirected to LOG or discarded[1]. > > > > > > > > > > > > > > > > > > > > In various production environments, either cloud native > or > > > > > physical > > > > > > > > > > machines, the disk space that Flink TaskManager can use > is > > > > > limited. > > > > > > > > > > > > > > > > > > > > In general, the flink users shouldn't use the > > > > > `System.out.println` > > > > > > in > > > > > > > > > > production, > > > > > > > > > > however this may happen when the number of Flink jobs and > > job > > > > > > > > developers > > > > > > > > > > is very large. Flink job may use System.out to output a > > large > > > > > > amount > > > > > > > of > > > > > > > > > > data > > > > > > > > > > to the taskmanager.out file. This file will not roll, it > > will > > > > > > always > > > > > > > > > > increment. > > > > > > > > > > Eventually the upper limit of what the TM can be used for > > is > > > > > > reached. > > > > > > > > > > > > > > > > > > > > We can support System out and err to be redirected to LOG > > or > > > > > > > discarded, > > > > > > > > > > the LOG can roll and won't increment forever. > > > > > > > > > > > > > > > > > > > > This feature is useful for SREs who maintain Flink > > > > environments, > > > > > > they > > > > > > > > can > > > > > > > > > > redirect System.out to LOG by default. Although the cause > > of > > > > this > > > > > > > > problem > > > > > > > > > > is > > > > > > > > > > that the user's code is not standardized, for SRE, > pushing > > > > users > > > > > to > > > > > > > > > modify > > > > > > > > > > the code one by one is usually a very time-consuming > > > operation. > > > > > > It's > > > > > > > > also > > > > > > > > > > useful for job stability where System.out is accidentally > > > used. > > > > > > > > > > > > > > > > > > > > Looking forward to your feedback, thanks~ > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fx%2F4guZE&data=05%7C01%7Cargoyal%40linkedin.com%7C937821de7bd846e6b97408dbe070beae%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C638350547072823674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zEv6B0Xiq2SNuU6Fm%2BAXnH%2BRILbm6Q0uDRbN7h6iaPM%3D&reserved=0 > > > > > > > > > <https://cwiki.apache.org/confluence/x/4guZE> > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Rui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best, > > > > > > > Hangxiang. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Best, > > Hangxiang. > > > -- Best, Hangxiang.