[jira] [Commented] (FLINK-17691) FlinkKafkaProducer transactional.id too long when using Semantic.EXACTLY_ONCE

2020-11-19 Thread John Mathews (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235745#comment-17235745
 ] 

John Mathews commented on FLINK-17691:
--

Submitted a fix, https://github.com/apache/flink/pull/14144

> FlinkKafkaProducer transactional.id too long when using Semantic.EXACTLY_ONCE
> -
>
> Key: FLINK-17691
> URL: https://issues.apache.org/jira/browse/FLINK-17691
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.0, 1.11.0
>Reporter: freezhan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-05-14-20-43-57-414.png, 
> image-2020-05-14-20-45-24-030.png, image-2020-05-14-20-45-59-878.png, 
> image-2020-05-14-21-09-01-906.png, image-2020-05-14-21-16-43-810.png, 
> image-2020-05-14-21-17-09-784.png
>
>
> When sink to Kafka using the {color:#FF}Semantic.EXACTLY_ONCE {color}mode.
> The flink Kafka Connector Producer will auto set the 
> {color:#FF}transactional.id{color}, and the user - defined value are 
> ignored.
>  
> When the job operator name too long, will send failed
> transactional.id is exceeds the kafka  {color:#FF}coordinator_key{color} 
> limit
> !image-2020-05-14-21-09-01-906.png!
>  
> *The flink Kafka Connector policy for automatic generation of transaction.id 
> is as follows*
>  
> 1. use the {color:#FF}taskName + "-" + operatorUniqueID{color} as 
> transactional.id prefix (may be too long)
>   getRuntimeContext().getTaskName() + "-" + ((StreamingRuntimeContext)    
> getRuntimeContext()).getOperatorUniqueID()
> 2. Range of available transactional ids 
> [nextFreeTransactionalId, nextFreeTransactionalId + parallelism * 
> kafkaProducersPoolSize)
> !image-2020-05-14-20-43-57-414.png!
>   !image-2020-05-14-20-45-24-030.png!
> !image-2020-05-14-20-45-59-878.png!
>  
> *The Kafka transaction.id check policy as follows:*
>  
> {color:#FF}string bytes.length can't larger than Short.MAX_VALUE 
> (32767){color}
> !image-2020-05-14-21-16-43-810.png!
> !image-2020-05-14-21-17-09-784.png!
>  
> *To reproduce this bug, the following conditions must be met:*
>  
>  # send msg to kafka with exactly once mode
>  # the task TaskName' length + TaskName's length is lager than the 32767 (A 
> very long line of SQL or window statements can appear)
> *I suggest a solution:*
>  
>      1.  Allows users to customize transactional.id 's prefix
> or
>      2. Do md5 on the prefix before returning the real transactional.id
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17691) FlinkKafkaProducer transactional.id too long when using Semantic.EXACTLY_ONCE

2020-11-19 Thread John Mathews (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235662#comment-17235662
 ] 

John Mathews commented on FLINK-17691:
--

hey [~aljoscha] I am happy to submit a PR to fix this bug, can I simply submit 
one that truncates the taskName? I think we can truncate either in the 
TransactionIdGenerator or in the TaskInfo constructor itself, depending on if 
we want the limit to apply everywhere or not.

> FlinkKafkaProducer transactional.id too long when using Semantic.EXACTLY_ONCE
> -
>
> Key: FLINK-17691
> URL: https://issues.apache.org/jira/browse/FLINK-17691
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.0, 1.11.0
>Reporter: freezhan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-05-14-20-43-57-414.png, 
> image-2020-05-14-20-45-24-030.png, image-2020-05-14-20-45-59-878.png, 
> image-2020-05-14-21-09-01-906.png, image-2020-05-14-21-16-43-810.png, 
> image-2020-05-14-21-17-09-784.png
>
>
> When sink to Kafka using the {color:#FF}Semantic.EXACTLY_ONCE {color}mode.
> The flink Kafka Connector Producer will auto set the 
> {color:#FF}transactional.id{color}, and the user - defined value are 
> ignored.
>  
> When the job operator name too long, will send failed
> transactional.id is exceeds the kafka  {color:#FF}coordinator_key{color} 
> limit
> !image-2020-05-14-21-09-01-906.png!
>  
> *The flink Kafka Connector policy for automatic generation of transaction.id 
> is as follows*
>  
> 1. use the {color:#FF}taskName + "-" + operatorUniqueID{color} as 
> transactional.id prefix (may be too long)
>   getRuntimeContext().getTaskName() + "-" + ((StreamingRuntimeContext)    
> getRuntimeContext()).getOperatorUniqueID()
> 2. Range of available transactional ids 
> [nextFreeTransactionalId, nextFreeTransactionalId + parallelism * 
> kafkaProducersPoolSize)
> !image-2020-05-14-20-43-57-414.png!
>   !image-2020-05-14-20-45-24-030.png!
> !image-2020-05-14-20-45-59-878.png!
>  
> *The Kafka transaction.id check policy as follows:*
>  
> {color:#FF}string bytes.length can't larger than Short.MAX_VALUE 
> (32767){color}
> !image-2020-05-14-21-16-43-810.png!
> !image-2020-05-14-21-17-09-784.png!
>  
> *To reproduce this bug, the following conditions must be met:*
>  
>  # send msg to kafka with exactly once mode
>  # the task TaskName' length + TaskName's length is lager than the 32767 (A 
> very long line of SQL or window statements can appear)
> *I suggest a solution:*
>  
>      1.  Allows users to customize transactional.id 's prefix
> or
>      2. Do md5 on the prefix before returning the real transactional.id
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20227) Kafka transaction IDs exceeding limit

2020-11-18 Thread John Mathews (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Mathews updated FLINK-20227:
-
Description: 
Flink uses the task name to generate the transactionId for the kafka producers. 
See: 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088

If the task name is sufficiently large (e.g. there are a large number of column 
names present), this can cause Kafka to fail to produce records with error:

"Error writing field 'coordinator_key': String length 34155 is larger than the 
maximum string length."

with stacktrace:
"
org.apache.kafka.common.protocol.types.SchemaException: 
{throwable0_message}\n\tat 
org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:94)\n\tat
 org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:499)\n\tat 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:473)\n\tat 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:433)\n\tat 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:437)\n\tat
 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:286)\n\tat
 org.apache.kafka.clients.producer.internals.Sender.run"

Is there a way to control these task names for the Table API + SQL? If not, can 
we limit the characters to ensure it is less than the 32k limit Kafka imposes?

  was:
Flink uses the task name to generate the transactionId for the kafka producers. 
See: 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088

If the task name is sufficiently large (e.g. there are a large number of column 
names present), this can cause Kafka to fail to produce records with:

"Error writing field 'coordinator_key': String length 34155 is larger than the 
maximum string length."

with stacktrace:
"
org.apache.kafka.common.protocol.types.SchemaException: 
{throwable0_message}\n\tat 
org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:94)\n\tat
 org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:499)\n\tat 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:473)\n\tat 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:433)\n\tat 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:437)\n\tat
 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:286)\n\tat
 org.apache.kafka.clients.producer.internals.Sender.run"

Is there a way to control these task names for the Table API + SQL? If not, can 
we limit the characters to ensure it is less than the 32k limit Kafka imposes?


> Kafka transaction IDs exceeding limit
> -
>
> Key: FLINK-20227
> URL: https://issues.apache.org/jira/browse/FLINK-20227
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Reporter: John Mathews
>Priority: Major
>
> Flink uses the task name to generate the transactionId for the kafka 
> producers. See: 
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088
> If the task name is sufficiently large (e.g. there are a large number of 
> column names present), this can cause Kafka to fail to produce records with 
> error:
> "Error writing field 'coordinator_key': String length 34155 is larger than 
> the maximum string length."
> with stacktrace:
> "
> org.apache.kafka.common.protocol.types.SchemaException: 
> {throwable0_message}\n\tat 
> org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
> org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
> org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
>  
> org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
>  
> or

[jira] [Updated] (FLINK-20227) Kafka transaction IDs exceeding limit

2020-11-18 Thread John Mathews (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Mathews updated FLINK-20227:
-
Description: 
Flink uses the task name to generate the transactionId for the kafka producers. 
See: 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088

If the task name is sufficiently large (e.g. there are a large number of column 
names present), this can cause Kafka to fail to produce records with:

"Error writing field 'coordinator_key': String length 34155 is larger than the 
maximum string length."

with stacktrace:
"
org.apache.kafka.common.protocol.types.SchemaException: 
{throwable0_message}\n\tat 
org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:94)\n\tat
 org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:499)\n\tat 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:473)\n\tat 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:433)\n\tat 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:437)\n\tat
 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:286)\n\tat
 org.apache.kafka.clients.producer.internals.Sender.run"

Is there a way to control these task names for the Table API + SQL? If not, can 
we limit the characters to ensure it is less than the 32k limit Kafka imposes?

  was:
Flink uses the task name to generate the transactionId for the kafka producers. 
See: 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088

If the task name is sufficiently large (e.g. there are a large number of column 
names present), this can cause Kafka to fail to produce records with:

```Error writing field 'coordinator_key': String length 34155 is larger than 
the maximum string length.```

with stacktrace:
```
org.apache.kafka.common.protocol.types.SchemaException: 
{throwable0_message}\n\tat 
org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:94)\n\tat
 org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:499)\n\tat 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:473)\n\tat 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:433)\n\tat 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:437)\n\tat
 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:286)\n\tat
 org.apache.kafka.clients.producer.internals.Sender.run```

Is there a way to control these task names for the Table API + SQL? If not, can 
we limit the characters to ensure it is less than the 32k limit Kafka imposes?


> Kafka transaction IDs exceeding limit
> -
>
> Key: FLINK-20227
> URL: https://issues.apache.org/jira/browse/FLINK-20227
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Reporter: John Mathews
>Priority: Major
>
> Flink uses the task name to generate the transactionId for the kafka 
> producers. See: 
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088
> If the task name is sufficiently large (e.g. there are a large number of 
> column names present), this can cause Kafka to fail to produce records with:
> "Error writing field 'coordinator_key': String length 34155 is larger than 
> the maximum string length."
> with stacktrace:
> "
> org.apache.kafka.common.protocol.types.SchemaException: 
> {throwable0_message}\n\tat 
> org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
> org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
> org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
>  
> org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
>  
> org.apach

[jira] [Created] (FLINK-20227) Kafka transaction IDs exceeding limit

2020-11-18 Thread John Mathews (Jira)
John Mathews created FLINK-20227:


 Summary: Kafka transaction IDs exceeding limit
 Key: FLINK-20227
 URL: https://issues.apache.org/jira/browse/FLINK-20227
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: John Mathews


Flink uses the task name to generate the transactionId for the kafka producers. 
See: 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1088

If the task name is sufficiently large (e.g. there are a large number of column 
names present), this can cause Kafka to fail to produce records with:

```Error writing field 'coordinator_key': String length 34155 is larger than 
the maximum string length.```

with stacktrace:
```
org.apache.kafka.common.protocol.types.SchemaException: 
{throwable0_message}\n\tat 
org.apache.kafka.common.protocol.types.Schema.write(Schema.java:61)\n\tat 
org.apache.kafka.common.protocol.types.Struct.writeTo(Struct.java:441)\n\tat 
org.apache.kafka.common.requests.AbstractRequestResponse.serialize(AbstractRequestResponse.java:30)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.serialize(AbstractRequest.java:101)\n\tat
 
org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:94)\n\tat
 org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:499)\n\tat 
org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:473)\n\tat 
org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:433)\n\tat 
org.apache.kafka.clients.producer.internals.Sender.maybeSendTransactionalRequest(Sender.java:437)\n\tat
 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:286)\n\tat
 org.apache.kafka.clients.producer.internals.Sender.run```

Is there a way to control these task names for the Table API + SQL? If not, can 
we limit the characters to ensure it is less than the 32k limit Kafka imposes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16488) Logging issues when running through K8s

2020-03-10 Thread John Mathews (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056134#comment-17056134
 ] 

John Mathews edited comment on FLINK-16488 at 3/10/20, 4:59 PM:


[~azagrebin] Given that you are working on consolidating / cleaning up the 
Flink docker environment, are there any plans for this issue to be addressed: 
https://issues.apache.org/jira/browse/FLINK-7990? I know it is only 
tangentially related to what you are doing, but it seems like this an issue for 
running Flink through docker / K8s right now given that Docker/K8's generally 
expect logs to go to stdout.


was (Author: jmathews3773):
[~azagrebin] Given that you are working on consolidating / cleaning up the 
Flink docker environment, are there any plans for this issue to be addressed: 
https://issues.apache.org/jira/browse/FLINK-7990? I know it is only 
tangentially related to what you are doing, but it seems like this an issue for 
running Flink through docker / K8s right now.

> Logging issues when running through K8s
> ---
>
> Key: FLINK-16488
> URL: https://issues.apache.org/jira/browse/FLINK-16488
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / Kubernetes
>Affects Versions: 1.8.0
>Reporter: John Mathews
>Priority: Major
>
> When running a slim downed version of the wordcount example, I am seeing some 
> very strange logging behavior when running using the K8's setup described on 
> the site. Essentially, every log line before the env.execute() shows up and 
> every log line after does not (verified both through the UI, and by grepping 
> within the container itself through K8s). 
>  
> Running the code below displays the following output: 
>  
>  2020-03-07 23:14:06,967 INFO _TOKEN_LOG_1
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_2
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_3
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_4
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_5
>   
> The job completes successfully, but there is no sign of either a _TOKEN_LOG_6 
> nor a _TOKEN_LOG_ERROR, and the counts themselves don't print. I have tested 
> a few permutations of this, and all logging stops as soon as an 
> environment.execute* command is called.
>  
>  
> Any idea on what is happening to these logs?
>  
>  
> --
> Repro steps:
> 1) Setup K8's environment as per: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]
> 2) Upload jar with code below
> 3) Run jar with no arguments
> 4) Observe that there is no printing of ___TOKEN_LOG_6 or _TOKEN_SOUT_6  in 
> the K8's containers or on the UI. There is also no printing of an error.
> {code:java}
> public static void main(String[] args) {
> log.info("_TOKEN_LOG_1");
> System.out.println("_TOKEN_SOUT_1");
> // ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> ParameterTool params = ParameterTool.fromArgs(args);
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> log.info("_TOKEN_LOG_2");
> System.out.println("_TOKEN_SOUT_2");
> // make parameters available in the web interface
> env.getConfig().setGlobalJobParameters(params);
> // get default test text data
> DataSet text = getTextDataSet(env);
> log.info("_TOKEN_LOG_3");
> System.out.println("_TOKEN_SOUT_3");
> DataSet> counts =
> // split up the lines in pairs (2-tuples) containing: (word,1)
> text.flatMap(new Tokenizer())
> // group by the tuple field "0" and sum up tuple field "1"
> .groupBy(0)
> .sum(1);
> log.info("_TOKEN_LOG_4");
> System.out.println("_TOKEN_SOUT_4");
> // emit result
> if (params.has("output")) {
> counts.writeAsCsv(params.get("output"), "\n", " ");
> // execute program
> try {
> env.execute("WordCount Example");
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR", e);
> System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
> }
> } else {
> log.info("_TOKEN_LOG_5");
> System.out.println("_TOKEN_SOUT_5");
> try {
> counts.print();
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR2", e);
> System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
> }
> log.info("_TOKEN_LOG_6");
> System.out.println("_TOKEN_SOUT_6");
> }
> }
> private static DataSet getTextDataSet(ExecutionEnvironment env) {
> // get default test text data
> return env.fromElements(
> "To be, or not to be,--that is the question:--

[jira] [Commented] (FLINK-16488) Logging issues when running through K8s

2020-03-10 Thread John Mathews (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056134#comment-17056134
 ] 

John Mathews commented on FLINK-16488:
--

[~azagrebin] Given that you are working on consolidating / cleaning up the 
Flink docker environment, are there any plans for this issue to be addressed: 
https://issues.apache.org/jira/browse/FLINK-7990? I know it is only 
tangentially related to what you are doing, but it seems like this an issue for 
running Flink through docker / K8s right now.

> Logging issues when running through K8s
> ---
>
> Key: FLINK-16488
> URL: https://issues.apache.org/jira/browse/FLINK-16488
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / Kubernetes
>Affects Versions: 1.8.0
>Reporter: John Mathews
>Priority: Major
>
> When running a slim downed version of the wordcount example, I am seeing some 
> very strange logging behavior when running using the K8's setup described on 
> the site. Essentially, every log line before the env.execute() shows up and 
> every log line after does not (verified both through the UI, and by grepping 
> within the container itself through K8s). 
>  
> Running the code below displays the following output: 
>  
>  2020-03-07 23:14:06,967 INFO _TOKEN_LOG_1
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_2
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_3
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_4
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_5
>   
> The job completes successfully, but there is no sign of either a _TOKEN_LOG_6 
> nor a _TOKEN_LOG_ERROR, and the counts themselves don't print. I have tested 
> a few permutations of this, and all logging stops as soon as an 
> environment.execute* command is called.
>  
>  
> Any idea on what is happening to these logs?
>  
>  
> --
> Repro steps:
> 1) Setup K8's environment as per: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]
> 2) Upload jar with code below
> 3) Run jar with no arguments
> 4) Observe that there is no printing of ___TOKEN_LOG_6 or _TOKEN_SOUT_6  in 
> the K8's containers or on the UI. There is also no printing of an error.
> {code:java}
> public static void main(String[] args) {
> log.info("_TOKEN_LOG_1");
> System.out.println("_TOKEN_SOUT_1");
> // ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> ParameterTool params = ParameterTool.fromArgs(args);
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> log.info("_TOKEN_LOG_2");
> System.out.println("_TOKEN_SOUT_2");
> // make parameters available in the web interface
> env.getConfig().setGlobalJobParameters(params);
> // get default test text data
> DataSet text = getTextDataSet(env);
> log.info("_TOKEN_LOG_3");
> System.out.println("_TOKEN_SOUT_3");
> DataSet> counts =
> // split up the lines in pairs (2-tuples) containing: (word,1)
> text.flatMap(new Tokenizer())
> // group by the tuple field "0" and sum up tuple field "1"
> .groupBy(0)
> .sum(1);
> log.info("_TOKEN_LOG_4");
> System.out.println("_TOKEN_SOUT_4");
> // emit result
> if (params.has("output")) {
> counts.writeAsCsv(params.get("output"), "\n", " ");
> // execute program
> try {
> env.execute("WordCount Example");
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR", e);
> System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
> }
> } else {
> log.info("_TOKEN_LOG_5");
> System.out.println("_TOKEN_SOUT_5");
> try {
> counts.print();
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR2", e);
> System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
> }
> log.info("_TOKEN_LOG_6");
> System.out.println("_TOKEN_SOUT_6");
> }
> }
> private static DataSet getTextDataSet(ExecutionEnvironment env) {
> // get default test text data
> return env.fromElements(
> "To be, or not to be,--that is the question:--",
> "Whether 'tis nobler in the mind to suffer",
> "The slings and arrows of outrageous fortune",
> "Or to take arms against a sea of troubles,",
> "And by opposing end them?--To die,--to sleep,--",
> "No more; and by a sleep to say we end",
> "The heartache, and the thousand natural shocks",
> "That flesh is heir to,--'tis a consummation",
> "Devoutly to be wish'd. To die,--to sleep;--",
>

[jira] [Comment Edited] (FLINK-16488) Logging issues when running through K8s

2020-03-10 Thread John Mathews (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056128#comment-17056128
 ] 

John Mathews edited comment on FLINK-16488 at 3/10/20, 4:49 PM:


[~azagrebin] I think you are right and this behavior is intentional (as per the 
ML discussion you linked). I do think it would make sense for this to be 
documented somewhere, as it definitely seems to violate the [principle of least 
astonishment|https://en.wikipedia.org/wiki/Principle_of_least_astonishment] 
though.

 



was (Author: jmathews3773):
[~azagrebin] I think you are right and this behavior is intentional (as per the 
ML discussion you linked). I do think it would make sense for this to be 
documented somewhere, as it definitely seems to violate the [principle of least 
astonishment|[https://en.wikipedia.org/wiki/Principle_of_least_astonishment]] 
though.

 

> Logging issues when running through K8s
> ---
>
> Key: FLINK-16488
> URL: https://issues.apache.org/jira/browse/FLINK-16488
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / Kubernetes
>Affects Versions: 1.8.0
>Reporter: John Mathews
>Priority: Major
>
> When running a slim downed version of the wordcount example, I am seeing some 
> very strange logging behavior when running using the K8's setup described on 
> the site. Essentially, every log line before the env.execute() shows up and 
> every log line after does not (verified both through the UI, and by grepping 
> within the container itself through K8s). 
>  
> Running the code below displays the following output: 
>  
>  2020-03-07 23:14:06,967 INFO _TOKEN_LOG_1
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_2
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_3
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_4
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_5
>   
> The job completes successfully, but there is no sign of either a _TOKEN_LOG_6 
> nor a _TOKEN_LOG_ERROR, and the counts themselves don't print. I have tested 
> a few permutations of this, and all logging stops as soon as an 
> environment.execute* command is called.
>  
>  
> Any idea on what is happening to these logs?
>  
>  
> --
> Repro steps:
> 1) Setup K8's environment as per: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]
> 2) Upload jar with code below
> 3) Run jar with no arguments
> 4) Observe that there is no printing of ___TOKEN_LOG_6 or _TOKEN_SOUT_6  in 
> the K8's containers or on the UI. There is also no printing of an error.
> {code:java}
> public static void main(String[] args) {
> log.info("_TOKEN_LOG_1");
> System.out.println("_TOKEN_SOUT_1");
> // ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> ParameterTool params = ParameterTool.fromArgs(args);
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> log.info("_TOKEN_LOG_2");
> System.out.println("_TOKEN_SOUT_2");
> // make parameters available in the web interface
> env.getConfig().setGlobalJobParameters(params);
> // get default test text data
> DataSet text = getTextDataSet(env);
> log.info("_TOKEN_LOG_3");
> System.out.println("_TOKEN_SOUT_3");
> DataSet> counts =
> // split up the lines in pairs (2-tuples) containing: (word,1)
> text.flatMap(new Tokenizer())
> // group by the tuple field "0" and sum up tuple field "1"
> .groupBy(0)
> .sum(1);
> log.info("_TOKEN_LOG_4");
> System.out.println("_TOKEN_SOUT_4");
> // emit result
> if (params.has("output")) {
> counts.writeAsCsv(params.get("output"), "\n", " ");
> // execute program
> try {
> env.execute("WordCount Example");
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR", e);
> System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
> }
> } else {
> log.info("_TOKEN_LOG_5");
> System.out.println("_TOKEN_SOUT_5");
> try {
> counts.print();
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR2", e);
> System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
> }
> log.info("_TOKEN_LOG_6");
> System.out.println("_TOKEN_SOUT_6");
> }
> }
> private static DataSet getTextDataSet(ExecutionEnvironment env) {
> // get default test text data
> return env.fromElements(
> "To be, or not to be,--that is the question:--",
> "Whether 'tis nobler in the mind to suffer",
> "The slings and arrows of outrageous 

[jira] [Commented] (FLINK-16488) Logging issues when running through K8s

2020-03-10 Thread John Mathews (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056128#comment-17056128
 ] 

John Mathews commented on FLINK-16488:
--

[~azagrebin] I think you are right and this behavior is intentional (as per the 
ML discussion you linked). I do think it would make sense for this to be 
documented somewhere, as it definitely seems to violate the [principle of least 
astonishment|[https://en.wikipedia.org/wiki/Principle_of_least_astonishment]] 
though.

 

> Logging issues when running through K8s
> ---
>
> Key: FLINK-16488
> URL: https://issues.apache.org/jira/browse/FLINK-16488
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / Kubernetes
>Affects Versions: 1.8.0
>Reporter: John Mathews
>Priority: Major
>
> When running a slim downed version of the wordcount example, I am seeing some 
> very strange logging behavior when running using the K8's setup described on 
> the site. Essentially, every log line before the env.execute() shows up and 
> every log line after does not (verified both through the UI, and by grepping 
> within the container itself through K8s). 
>  
> Running the code below displays the following output: 
>  
>  2020-03-07 23:14:06,967 INFO _TOKEN_LOG_1
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_2
>  2020-03-07 23:14:06,968 INFO _TOKEN_LOG_3
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_4
>  2020-03-07 23:14:06,970 INFO _TOKEN_LOG_5
>   
> The job completes successfully, but there is no sign of either a _TOKEN_LOG_6 
> nor a _TOKEN_LOG_ERROR, and the counts themselves don't print. I have tested 
> a few permutations of this, and all logging stops as soon as an 
> environment.execute* command is called.
>  
>  
> Any idea on what is happening to these logs?
>  
>  
> --
> Repro steps:
> 1) Setup K8's environment as per: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]
> 2) Upload jar with code below
> 3) Run jar with no arguments
> 4) Observe that there is no printing of ___TOKEN_LOG_6 or _TOKEN_SOUT_6  in 
> the K8's containers or on the UI. There is also no printing of an error.
> {code:java}
> public static void main(String[] args) {
> log.info("_TOKEN_LOG_1");
> System.out.println("_TOKEN_SOUT_1");
> // ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> ParameterTool params = ParameterTool.fromArgs(args);
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> log.info("_TOKEN_LOG_2");
> System.out.println("_TOKEN_SOUT_2");
> // make parameters available in the web interface
> env.getConfig().setGlobalJobParameters(params);
> // get default test text data
> DataSet text = getTextDataSet(env);
> log.info("_TOKEN_LOG_3");
> System.out.println("_TOKEN_SOUT_3");
> DataSet> counts =
> // split up the lines in pairs (2-tuples) containing: (word,1)
> text.flatMap(new Tokenizer())
> // group by the tuple field "0" and sum up tuple field "1"
> .groupBy(0)
> .sum(1);
> log.info("_TOKEN_LOG_4");
> System.out.println("_TOKEN_SOUT_4");
> // emit result
> if (params.has("output")) {
> counts.writeAsCsv(params.get("output"), "\n", " ");
> // execute program
> try {
> env.execute("WordCount Example");
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR", e);
> System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
> }
> } else {
> log.info("_TOKEN_LOG_5");
> System.out.println("_TOKEN_SOUT_5");
> try {
> counts.print();
> } catch (Exception e) {
> e.printStackTrace();
> log.info("_TOKEN_LOG_ERROR2", e);
> System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
> }
> log.info("_TOKEN_LOG_6");
> System.out.println("_TOKEN_SOUT_6");
> }
> }
> private static DataSet getTextDataSet(ExecutionEnvironment env) {
> // get default test text data
> return env.fromElements(
> "To be, or not to be,--that is the question:--",
> "Whether 'tis nobler in the mind to suffer",
> "The slings and arrows of outrageous fortune",
> "Or to take arms against a sea of troubles,",
> "And by opposing end them?--To die,--to sleep,--",
> "No more; and by a sleep to say we end",
> "The heartache, and the thousand natural shocks",
> "That flesh is heir to,--'tis a consummation",
> "Devoutly to be wish'd. To die,--to sleep;--",
> "To sleep! perch

[jira] [Updated] (FLINK-16488) Logging issues when running through K8s

2020-03-07 Thread John Mathews (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Mathews updated FLINK-16488:
-
Description: 
When running a slim downed version of the wordcount example, I am seeing some 
very strange logging behavior when running using the K8's setup described on 
the site. Essentially, every log line before the env.execute() shows up and 
every log line after does not (verified both through the UI, and by grepping 
within the container itself through K8s). 

 

Running the code below displays the following output: 

 
 2020-03-07 23:14:06,967 INFO _TOKEN_LOG_1
 2020-03-07 23:14:06,968 INFO _TOKEN_LOG_2
 2020-03-07 23:14:06,968 INFO _TOKEN_LOG_3
 2020-03-07 23:14:06,970 INFO _TOKEN_LOG_4
 2020-03-07 23:14:06,970 INFO _TOKEN_LOG_5
  

The job completes successfully, but there is no sign of either a _TOKEN_LOG_6 
nor a _TOKEN_LOG_ERROR, and the counts themselves don't print. I have tested a 
few permutations of this, and all logging stops as soon as an 
environment.execute* command is called.

 

 

Any idea on what is happening to these logs?

 

 

--

Repro steps:

1) Setup K8's environment as per: 

[https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]

2) Upload jar with code below

3) Run jar with no arguments

4) Observe that there is no printing of ___TOKEN_LOG_6 or _TOKEN_SOUT_6  in the 
K8's containers or on the UI. There is also no printing of an error.
{code:java}
public static void main(String[] args) {
log.info("_TOKEN_LOG_1");
System.out.println("_TOKEN_SOUT_1");
// ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
ParameterTool params = ParameterTool.fromArgs(args);
final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
log.info("_TOKEN_LOG_2");
System.out.println("_TOKEN_SOUT_2");
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);

// get default test text data
DataSet text = getTextDataSet(env);
log.info("_TOKEN_LOG_3");
System.out.println("_TOKEN_SOUT_3");
DataSet> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
log.info("_TOKEN_LOG_4");
System.out.println("_TOKEN_SOUT_4");
// emit result
if (params.has("output")) {
counts.writeAsCsv(params.get("output"), "\n", " ");
// execute program
try {
env.execute("WordCount Example");
} catch (Exception e) {
e.printStackTrace();
log.info("_TOKEN_LOG_ERROR", e);
System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
}
} else {
log.info("_TOKEN_LOG_5");
System.out.println("_TOKEN_SOUT_5");
try {
counts.print();
} catch (Exception e) {
e.printStackTrace();
log.info("_TOKEN_LOG_ERROR2", e);
System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
}
log.info("_TOKEN_LOG_6");
System.out.println("_TOKEN_SOUT_6");
}
}

private static DataSet getTextDataSet(ExecutionEnvironment env) {
// get default test text data
return env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
"To sleep! perchance to dream:--ay, there's the rub;",
"For in that sleep of death what dreams may come,",
"When we have shuffled off this mortal coil,",
"Must give us pause: there's the respect",
"That makes calamity of so long life;",
"For who would bear the whips and scorns of time,",
"The oppressor's wrong, the proud man's contumely,",
"The pangs of despis'd love, the law's delay,",
"The insolence of office, and the spurns",
"That patient merit of the unworthy takes,",
"When he himself might his quietus make",
"With a bare bodkin? who would these fardels bear,",
"To grunt and sweat under a weary life,",
"But that the dread of something after death,--",
"The undiscover'd country, from whose bourn",
"No traveller returns,--puzzles the will,",
"And makes us rather bea

[jira] [Updated] (FLINK-16488) Logging issues when running through K8s

2020-03-07 Thread John Mathews (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Mathews updated FLINK-16488:
-
Description: 
When running a slim downed version of the wordcount example, I am seeing some 
very strange logging behavior when running using the K8's setup described on 
the site. Essentially, every log line before the env.execute() shows up and 
every log line after does not (verified both through the UI, and by grepping 
within the container itself through K8s). 

 

Running the code below displays the following output: 

 
 2020-03-07 23:14:06,967 INFO _TOKEN_LOG_1
 2020-03-07 23:14:06,968 INFO _TOKEN_LOG_2
 2020-03-07 23:14:06,968 INFO _TOKEN_LOG_3
 2020-03-07 23:14:06,970 INFO _TOKEN_LOG_4
 2020-03-07 23:14:06,970 INFO _TOKEN_LOG_5
  

The job completes successfully, but there is no sign of either a _TOKEN_LOG_6 
nor a _TOKEN_LOG_ERROR. I have tested a few permutations of this, and all 
logging stops as soon as an environment.execute(*) command is called.

 

 

Any idea on what is happening to these logs?

 

 

--

Repro steps:

Setup K8's environment as per: 

I am running with the K8's environment described here:

[https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]

Run a jar with this code:

Upload jar with this code via the UI, and run with no arguments: 
{code:java}
public static void main(String[] args) {
log.info("_TOKEN_LOG_1");
System.out.println("_TOKEN_SOUT_1");
// ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
ParameterTool params = ParameterTool.fromArgs(args);
final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
log.info("_TOKEN_LOG_2");
System.out.println("_TOKEN_SOUT_1");
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);

// get default test text data
DataSet text = getTextDataSet(env);
log.info("_TOKEN_LOG_3");
System.out.println("_TOKEN_SOUT_1");
DataSet> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
log.info("_TOKEN_LOG_4");
System.out.println("_TOKEN_SOUT_1");
// emit result
if (params.has("output")) {
counts.writeAsCsv(params.get("output"), "\n", " ");
// execute program
try {
env.execute("WordCount Example");
} catch (Exception e) {
e.printStackTrace();
log.info("_TOKEN_LOG_ERROR", e);
System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
}
} else {
log.info("_TOKEN_LOG_5");
System.out.println("_TOKEN_SOUT_1");
try {
counts.print();
} catch (Exception e) {
e.printStackTrace();
log.info("_TOKEN_LOG_ERROR2", e);
System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
}
log.info("_TOKEN_LOG_6");
System.out.println("_TOKEN_SOUT_1");
}
}

private static DataSet getTextDataSet(ExecutionEnvironment env) {
// get default test text data
return env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
"To sleep! perchance to dream:--ay, there's the rub;",
"For in that sleep of death what dreams may come,",
"When we have shuffled off this mortal coil,",
"Must give us pause: there's the respect",
"That makes calamity of so long life;",
"For who would bear the whips and scorns of time,",
"The oppressor's wrong, the proud man's contumely,",
"The pangs of despis'd love, the law's delay,",
"The insolence of office, and the spurns",
"That patient merit of the unworthy takes,",
"When he himself might his quietus make",
"With a bare bodkin? who would these fardels bear,",
"To grunt and sweat under a weary life,",
"But that the dread of something after death,--",
"The undiscover'd country, from whose bourn",
"No traveller returns,--puzzles the will,",
"And makes us rather bear those ills we have",
"Than fly to others that we know not of?",
"Thus conscie

[jira] [Created] (FLINK-16488) Logging issues when running through K8s

2020-03-07 Thread John Mathews (Jira)
John Mathews created FLINK-16488:


 Summary: Logging issues when running through K8s
 Key: FLINK-16488
 URL: https://issues.apache.org/jira/browse/FLINK-16488
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.8.0
 Environment: I am running with the K8's environment described here:

[https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html]

 

Here is a sample of the code I am running with: 
{code:java}
public static void main(String[] args) {
log.info("_TOKEN_LOG_1");
System.out.println("_TOKEN_SOUT_1");
// ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
ParameterTool params = ParameterTool.fromArgs(args);
final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
log.info("_TOKEN_LOG_2");
System.out.println("_TOKEN_SOUT_1");
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);

// get default test text data
DataSet text = getTextDataSet(env);
log.info("_TOKEN_LOG_3");
System.out.println("_TOKEN_SOUT_1");
DataSet> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
log.info("_TOKEN_LOG_4");
System.out.println("_TOKEN_SOUT_1");
// emit result
if (params.has("output")) {
counts.writeAsCsv(params.get("output"), "\n", " ");
// execute program
try {
env.execute("WordCount Example");
} catch (Exception e) {
e.printStackTrace();
log.info("_TOKEN_LOG_ERROR", e);
System.out.println("_TOKEN_SOUT_ERROR" + e.toString());
}
} else {
log.info("_TOKEN_LOG_5");
System.out.println("_TOKEN_SOUT_1");
try {
counts.print();
} catch (Exception e) {
e.printStackTrace();
log.info("_TOKEN_LOG_ERROR2", e);
System.out.println("_TOKEN_SOUT_ERROR2" + e.toString());
}
log.info("_TOKEN_LOG_6");
System.out.println("_TOKEN_SOUT_1");
}
}

private static DataSet getTextDataSet(ExecutionEnvironment env) {
// get default test text data
return env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
"To sleep! perchance to dream:--ay, there's the rub;",
"For in that sleep of death what dreams may come,",
"When we have shuffled off this mortal coil,",
"Must give us pause: there's the respect",
"That makes calamity of so long life;",
"For who would bear the whips and scorns of time,",
"The oppressor's wrong, the proud man's contumely,",
"The pangs of despis'd love, the law's delay,",
"The insolence of office, and the spurns",
"That patient merit of the unworthy takes,",
"When he himself might his quietus make",
"With a bare bodkin? who would these fardels bear,",
"To grunt and sweat under a weary life,",
"But that the dread of something after death,--",
"The undiscover'd country, from whose bourn",
"No traveller returns,--puzzles the will,",
"And makes us rather bear those ills we have",
"Than fly to others that we know not of?",
"Thus conscience does make cowards of us all;",
"And thus the native hue of resolution",
"Is sicklied o'er with the pale cast of thought;",
"And enterprises of great pith and moment,",
"With this regard, their currents turn awry,",
"And lose the name of action.--Soft you now!",
"The fair Ophelia!--Nymph, in thy orisons",
"Be all my sins remember'd.");
}
{code}
 
Reporter: John Mathews


When running a slim downed version of the wordcount example, I am seeing some 
very strange logging behavior when running using the K8's setup described on 
the site. Essentially, every log line before the env.execute() shows up and 
every log line after does not (verified both through the UI, and by grepping 
within the container itself through K8s). 

 

Running the c