Re: Log4J

2017-02-19 Thread Robert Metzger
Hi Chet,

These are the files I have in my lib/ folder with the working log4j2
integration:

-rw-r--r--  1 robert robert 79966937 Oct 10 13:49 flink-dist_2.10-1.1.3.jar
-rw-r--r--  1 robert robert90883 Dec  9 20:13
flink-python_2.10-1.1.3.jar
-rw-r--r--  1 robert robert60547 Dec  9 18:45 log4j-1.2-api-2.7.jar
-rw-rw-r--  1 robert robert  1638598 Oct 22 16:08
log4j2-gelf-1.3.1-shaded.jar
-rw-rw-r--  1 robert robert 1056 Dec  9 20:12 log4j2.properties
-rw-r--r--  1 robert robert   219001 Dec  9 18:45 log4j-api-2.7.jar
-rw-r--r--  1 robert robert  1296865 Dec  9 18:45 log4j-core-2.7.jar
-rw-r--r--  1 robert robert22918 Dec  9 18:46 log4j-slf4j-impl-2.7.jar

You don't need the "log4j2-gelf-1.3.1-shaded.jar", that's a GELF appender
for Greylog2.

On Mon, Feb 20, 2017 at 5:41 AM, Chet Masterson 
wrote:

> I read through the link you provided, Stephan. However, I am still
> confused. The instructions mention specific jar files for Logback, I am not
> sure which of the log4j 2.x jars I need to put in the the flink /lib
> directory. I tried various combinations of log4j-1.2-api-2.8.jar,
> log4j-slf4j-impl-2.8.jar, log4j-to-slf4j-2.8.jar, and renamed the stock
> log4j-1.2.17.jar and slf4j-log4j12-1.7.7.jar, but then the job manager
> would not start, and threw a 'NoClassDefFoundError:
> org/apache/logging/log4j/LogManager'. And this is without deploying my
> job out there, so I don't think any of the "Use Logback when running Flink
> out of the IDE / from a Java application" section instructions are relevant.
>
> Can someone be more specific how to do this? If I get it to work, I'll be
> happy to formally document it in whatever format would help the project out
> long term.
>
> Thanks!
>
>
> 16.02.2017, 05:54, "Stephan Ewen" :
>
> Hi!
>
> The bundled log4j version (1.x) does not support that.
>
> But you can replace the logging jars with those of a different framework
> (like log4j 2.x), which supports changing the configuration without
> stopping the application.
> You don't need to rebuild flink, simply replace two jars in the "lib"
> folder (and update the config file, because log4j 2.x has a different
> config format).
>
> This guide shows how to swap log4j 1.x for logback, and you should be able
> to swap in log4j 2.x in the exact same way.
>
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/monitoring/best_practices.html#use-logback-
> when-running-flink-on-a-cluster
>
>
> On Thu, Feb 16, 2017 at 5:20 AM, Chet Masterson  > wrote:
>
> Is there a way to reload a log4j.properties file without stopping and
> starting the job server?
>
>


Re: Log4J

2017-02-19 Thread Chet Masterson
I read through the link you provided, Stephan. However, I am still confused. The instructions mention specific jar files for Logback, I am not sure which of the log4j 2.x jars I need to put in the the flink /lib directory. I tried various combinations of log4j-1.2-api-2.8.jar, log4j-slf4j-impl-2.8.jar, log4j-to-slf4j-2.8.jar, and renamed the stock log4j-1.2.17.jar and slf4j-log4j12-1.7.7.jar, but then the job manager would not start, and threw a 'NoClassDefFoundError: org/apache/logging/log4j/LogManager'. And this is without deploying my job out there, so I don't think any of the "Use Logback when running Flink out of the IDE / from a Java application" section instructions are relevant. Can someone be more specific how to do this? If I get it to work, I'll be happy to formally document it in whatever format would help the project out long term. Thanks!  16.02.2017, 05:54, "Stephan Ewen" :Hi! The bundled log4j version (1.x) does not support that. But you can replace the logging jars with those of a different framework (like log4j 2.x), which supports changing the configuration without stopping the application.You don't need to rebuild flink, simply replace two jars in the "lib" folder (and update the config file, because log4j 2.x has a different config format). This guide shows how to swap log4j 1.x for logback, and you should be able to swap in log4j 2.x in the exact same way. https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/best_practices.html#use-logback-when-running-flink-on-a-cluster  On Thu, Feb 16, 2017 at 5:20 AM, Chet Masterson  wrote:Is there a way to reload a log4j.properties file without stopping and starting the job server?

回复:Jobmanager was killed when disk less 10% in yarn

2017-02-19 Thread wangzhijiang999
The log just indicates the SignalHandler handles the kill signal and the 
process of JobManager exit , and it can not get the reason from it.You may 
check the container log from node manager why it was killed.
Best,
Zhijiang--发件人:lining
 jing 发送时间:2017年2月20日(星期一) 10:13收件人:user 
主 题:Jobmanager was killed when disk less 10% in yarn
Hi,
I use yarn manager resource. Recently when disk less 10% , JobManager was 
killed. I want to know whether the reason is the disk problem.

log : 

2017-02-19 03:20:37,087 INFO  org.apache.flink.yarn.YarnApplicationMasterRunner 
            - RECEIVED SIGNAL 15: SIGTERM. Shutting down as 
requested.2017-02-19 03:20:37,088 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Stopping 
checkpoint coordinator for job 1b45608e30808183913eeffbb4d855da2017-02-19 
03:20:37,088 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Stopping checkpoint coordinator for job 
1b45608e30808183913eeffbb4d855da2017-02-19 03:20:37,089 INFO  
org.apache.flink.runtime.blob.BlobCache                       - Shutting down 
BlobCache2017-02-19 03:20:37,089 INFO  
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Removing web 
dashboard root cache directory 
/tmp/flink-web-dfa2b369-44ea-4e35-8011-672a1e627a102017-02-19 03:20:37,089 INFO 
 org.apache.flink.runtime.blob.BlobCache                       - Shutting down 
BlobCache2017-02-19 03:20:37,137 INFO  
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Removing web 
dashboard jar upload directory 
/tmp/flink-web-upload-d6edb5ea-5894-489b-89f7-f2972fc9433d2017-02-19 
03:20:37,138 INFO  org.apache.flink.runtime.blob.BlobServer                     
 - Stopped BLOB server at 0.0.0.0:54513




Jobmanager was killed when disk less 10% in yarn

2017-02-19 Thread lining jing
Hi,

I use yarn manager resource. Recently when disk less 10% , JobManager was
killed. I want to know whether the reason is the disk problem.


log :


2017-02-19 03:20:37,087 INFO
 org.apache.flink.yarn.YarnApplicationMasterRunner - RECEIVED
SIGNAL 15: SIGTERM. Shutting down as requested.
2017-02-19 03:20:37,088 INFO
 org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping
checkpoint coordinator for job 1b45608e30808183913eeffbb4d855da
2017-02-19 03:20:37,088 INFO
 org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping
checkpoint coordinator for job 1b45608e30808183913eeffbb4d855da
2017-02-19 03:20:37,089 INFO  org.apache.flink.runtime.blob.BlobCache
- Shutting down BlobCache
2017-02-19 03:20:37,089 INFO
 org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing
web dashboard root cache directory
/tmp/flink-web-dfa2b369-44ea-4e35-8011-672a1e627a10
2017-02-19 03:20:37,089 INFO  org.apache.flink.runtime.blob.BlobCache
- Shutting down BlobCache
2017-02-19 03:20:37,137 INFO
 org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing
web dashboard jar upload directory
/tmp/flink-web-upload-d6edb5ea-5894-489b-89f7-f2972fc9433d
2017-02-19 03:20:37,138 INFO  org.apache.flink.runtime.blob.BlobServer
 - Stopped BLOB server at 0.0.0.0:54513


回复:Checkpointing with RocksDB as statebackend

2017-02-19 Thread 施晓罡(星罡)
Hi Vinay
Can you provide the LOG file in RocksDB? It helps a lot to figure out the 
problems becuse it records the options and the events happened during the 
execution. Otherwise configured, it should locate at the path set in 
System.getProperty("java.io.tmpdir"). 
Typically, a large amount of memory is consumed by RocksDB to store necessary 
indices. To avoid the unlimited growth in the memory consumption, you can put 
these indices into block cache (set CacheIndexAndFilterBlock to true) and 
properly set the block cache size. 
You can also increase the number of backgroud threads to improve the 
performance of flushes and compactions (via MaxBackgroundFlushes and 
MaxBackgroudCompactions).
In YARN clusters, task managers will be killed if their memory utilization 
exceeds the allocation size. Currently Flink does not count the memory used by 
RocksDB in the allocation. We are working on fine-grained resource allocation 
(see FLINK-5131). It may help to avoid such problems.
May the information helps you.
Regards,Xiaogang

--发件人:Vinay 
Patil 发送时间:2017年2月17日(星期五) 21:19收件人:user 
主 题:Re: Checkpointing with RocksDB as statebackend
Hi Guys,

There seems to be some issue with RocksDB memory utilization.

Within few minutes of job run the physical memory usage increases by 4-5 GB and 
it keeps on increasing.
I have tried different options for Max Buffer Size(30MB, 64MB, 128MB , 512MB) 
and Min Buffer to Merge as 2, but the physical memory keeps on increasing.

According to RocksDB documentation, these are the main options on which 
flushing to storage is based.

Can you please point me where am I doing wrong. I have tried different 
configuration options but each time the Task Manager is getting killed after 
some time :)
Regards,Vinay Patil

On Thu, Feb 16, 2017 at 6:02 PM, Vinay Patil  wrote:
I think its more of related to RocksDB, I am also not aware about RocksDB but 
reading the tuning guide to understand the important values that can be set
Regards,Vinay Patil

On Thu, Feb 16, 2017 at 5:48 PM, Stefan Richter [via Apache Flink User Mailing 
List archive.]  wrote:


What kind of problem are we talking about? S3 related or RocksDB 
related. I am not aware of problems with RocksDB per se. I think seeing logs 
for this would be very helpful.
Am 16.02.2017 um 11:56 schrieb Aljoscha Krettek <[hidden email]>:
[hidden email] and [hidden email] could this be the same problem that you 
recently saw when working with other people?

On Wed, 15 Feb 2017 at 17:23 Vinay Patil <[hidden email]> wrote:
Hi Guys,

Can anyone please help me with this issue
Regards,Vinay Patil

On Wed, Feb 15, 2017 at 6:17 PM, Vinay Patil <[hidden email]> wrote:
Hi Ted,

I have 3 boxes in my pipeline , 1st and 2nd box containing source and s3 sink 
and the 3rd box is window operator followed by chained operators and a s3 sink

So in the details link section I can see that that S3 sink is taking time for 
the acknowledgement and it is not even going to the window operator chain.

But as shown in the snapshot ,checkpoint id 19 did not get any acknowledgement. 
Not sure what is causing the issue
Regards,Vinay Patil

On Wed, Feb 15, 2017 at 5:51 PM, Ted Yu [via Apache Flink User Mailing List 
archive.] <[hidden email]> wrote:


What did the More Details link say ?


Thanks 


> On Feb 15, 2017, at 3:11 AM, vinay patil <[hidden email]> wrote:

> 

> Hi,

> 

> I have kept the checkpointing interval to 6secs and minimum pause between

> checkpoints to 5secs, while testing the pipeline I have observed that that

> for some checkpoints it is taking long time , as you can see in the attached

> snapshot checkpoint id 19 took the maximum time before it gets failed,

> although it has not received any acknowledgements, now during this 10minutes

> the entire pipeline did not make any progress and no data was getting

> processed. (For Ex : In 13minutes 20M records were processed and when the

> checkpoint took time there was no progress for the next 10minutes)

> 

> I have even tried to set max checkpoint timeout to 3min, but in that case as

> well multiple checkpoints were getting failed.

> 

> I have set RocksDB FLASH_SSD_OPTION 

> What could be the issue ? 

> 

> P.S. I am writing to 3 S3 sinks 

> 

> checkpointing_issue.PNG
> 
>   

> 

> 

> 

> --

> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpointing-with-RocksDB-as-statebackend-tp11640.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.












If you reply to this email, your message will be added to the 
discussion below:

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpointing-with-