[jira] [Created] (FLINK-8829) Flink in EMR(YARN) is down due to Akka communication issue

2018-03-01 Thread Aleksandr Filichkin (JIRA)
Aleksandr Filichkin created FLINK-8829:
--

 Summary: Flink in EMR(YARN) is down due to Akka communication issue
 Key: FLINK-8829
 URL: https://issues.apache.org/jira/browse/FLINK-8829
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.2
Reporter: Aleksandr Filichkin


Hi,

We have running Flink 1.3.2 app in Amazon EMR. Every week our Flink job is down 
due to:

_2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - 
Association with remote system 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 has failed, address is now gated for [5000] ms. Reason: [Association failed 
with 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]]
 Caused by: [Connection refused: 
ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 
2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 2018-02-16 19:00:05,596 INFO 
org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to 
JobManager 
akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager].
 Triggering connection timeout._

Do you have any ideas how to troubleshoot it?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8827) When FLINK_CONF_DIR contains spaces, execute zookeeper related scripts failed

2018-03-01 Thread Donghui Xu (JIRA)
Donghui Xu created FLINK-8827:
-

 Summary: When FLINK_CONF_DIR contains spaces, execute zookeeper 
related scripts failed
 Key: FLINK-8827
 URL: https://issues.apache.org/jira/browse/FLINK-8827
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.4.0
 Environment: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Reporter: Donghui Xu


When the path of FLINK_CONF_DIR including spaces, executing zookeeper related 
scripts failed with the following error message: Expect binary expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Convert main Table API classes into traits

2018-03-01 Thread Timo Walther

Hi everyone,

I'm currently thinking about how to implement FLINK-8606. The reason 
behind it is that Java users are able to see all variables and methods 
that are declared 'private[flink]' or even 'protected' in Scala. Classes 
such as TableEnvironment look very messy from the outside in Java. Since 
we cannot change the visibility of Scala protected members, I was 
thinking about a bigger change to solve this issue once and for all. My 
idea is to convert all TableEnvironment classes and maybe the Table 
class into traits. The actual implementation would end up in some 
internal classes such as "InternalTableEnvironment" that implement the 
public traits. The goal would be to stay source code compatible.


What do you think?

Regards,
Timo



[jira] [Created] (FLINK-8826) In Flip6 mode, when starting yarn cluster, configured taskmanager.heap.mb is ignored

2018-03-01 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-8826:
-

 Summary: In Flip6 mode, when starting yarn cluster, configured 
taskmanager.heap.mb is ignored
 Key: FLINK-8826
 URL: https://issues.apache.org/jira/browse/FLINK-8826
 Project: Flink
  Issue Type: Bug
  Components: ResourceManager, YARN
Affects Versions: 1.5.0
Reporter: Piotr Nowojski


When I tried running some job on the cluster, despite setting 

taskmanager.heap.mb = 3072

taskmanager.network.memory.fraction: 0.4

and reported in the console

{{

Cluster specification: ClusterSpecification\{masterMemoryMB=768, 
taskManagerMemoryMB=3072, numberTaskManagers=92, slotsPerTaskManager=1}

}}

The actual settings were:

{{

2018-03-01 14:53:18,918 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            - 


2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Starting YARN TaskExecutor runner (Version: 1.5-SNAPSHOT, 
Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)

2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  OS current user: yarn

2018-03-01 14:53:19,780 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Current Hadoop/Kerberos user: hadoop

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
1.8/25.161-b14

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Maximum heap size: 245 MiBytes

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JAVA_HOME: /usr/lib/jvm/java-openjdk

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Hadoop version: 2.4.1

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JVM Options:

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Xms255m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Xmx255m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -XX:MaxDirectMemorySize=769m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     
-Dlog.file=/var/log/hadoop-yarn/containers/application_1516373731080_1150/container_1516373731080_1150_01_000105/taskmanager.log

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Dlogback.configurationFile=file:./logback.xml

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Dlog4j.configuration=file:./log4j.properties

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Program Arguments:

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     --configDir

}}

Heap was set to 255, while with default cuts of it should be 1383. 255MB seems 
like coming from default taskmanager.heap.mb value of 1024.

 

When starting in non flip6 everything works as expected:

{{

2018-03-01 14:04:49,650 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            - 


2018-03-01 14:04:49,700 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Starting YARN 
TaskManager (Version: 1.5-SNAPSHOT, Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)

2018-03-01 14:04:49,700 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  OS current 
user: yarn

2018-03-01 14:04:53,277 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Current 
Hadoop/Kerberos user: hadoop

2018-03-01 14:04:53,278 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM: OpenJDK 
64-Bit Server VM - Oracle Corporation - 1.8/25.161-b14

2018-03-01 14:04:53,279 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Maximum heap 
size: 1326 MiBytes

2018-03-01 14:04:53,279 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JAVA_HOME: 
/usr/lib/jvm/java-openjdk

2018-03-01 14:04:53,282 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Hadoop 
version: 2.4.1

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM Options:

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     -Xms1383m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     -Xmx1383m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     

[jira] [Created] (FLINK-8825) Disallow new String() without charset in checkstyle

2018-03-01 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8825:
---

 Summary: Disallow new String() without charset in checkstyle
 Key: FLINK-8825
 URL: https://issues.apache.org/jira/browse/FLINK-8825
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Aljoscha Krettek
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalClassName()' with 'getClassName()'

2018-03-01 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8824:
---

 Summary: In Kafka Consumers, replace 'getCanonicalClassName()' 
with 'getClassName()'
 Key: FLINK-8824
 URL: https://issues.apache.org/jira/browse/FLINK-8824
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Reporter: Stephan Ewen
 Fix For: 1.5.0


The connector uses {{getCanonicalClassName()}} in all places, gather than 
{{getClassName()}}.

{{getCanonicalClassName()}}'s intention is to normalize class names for arrays, 
etc, but is problematic when instantiating classes from class names.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8823) Add network profiling/diagnosing metrics.

2018-03-01 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-8823:
-

 Summary: Add network profiling/diagnosing metrics.
 Key: FLINK-8823
 URL: https://issues.apache.org/jira/browse/FLINK-8823
 Project: Flink
  Issue Type: Sub-task
  Components: Network
Affects Versions: 1.5.0
Reporter: Piotr Nowojski






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8822) RotateLogFile may not work well when sed version is below 4.2

2018-03-01 Thread Xin Liu (JIRA)
Xin Liu created FLINK-8822:
--

 Summary: RotateLogFile may not work well when sed version is below 
4.2
 Key: FLINK-8822
 URL: https://issues.apache.org/jira/browse/FLINK-8822
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Xin Liu
 Fix For: 1.5.0


In bin/config.sh rotateLogFilesWithPrefix(),it use extended regular to process 
filename with "sed -E",but when sed version is below 4.2,it turns out "sed: 
invalid option -- 'E'"

and RotateLogFile won't work well : There will be only one logfile no matter 
what is $MAX_LOG_FILE_NUMBER.


so use sed -r may be more suitable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8821) Fix BigDecimal divide in AvgAggFunction

2018-03-01 Thread Ruidong Li (JIRA)
Ruidong Li created FLINK-8821:
-

 Summary: Fix BigDecimal divide in AvgAggFunction
 Key: FLINK-8821
 URL: https://issues.apache.org/jira/browse/FLINK-8821
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Ruidong Li
Assignee: Ruidong Li


The DecimalAvgAggFunction lacks precision protection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUS] Flink SQL Client dependency management

2018-03-01 Thread Fabian Hueske
I agree, option (2) would be the easiest approach for the users.


2018-03-01 0:00 GMT+01:00 Rong Rong :

> Hi Timo,
>
> Thanks for the initiating the SQL client effort. I agree with Xingcan's
> points, adding to it (1) most of the user for SQL client would very likely
> to have little Maven / build tool knowledge and (2) most likely the build
> script would grow much complex in the future that makes it exponentially
> hard for user to modify themselves.
>
> On (3) the single "fat" jar idea, adding on to the dependency conflict
> issue, another very common way I see is that users often want to maintain a
> list of individual jars, such as a list of relatively-constant, handy UDFs
> every time using the SQL client. They will probably need to package and
> ship separately anyway. I was wondering if "download-and-drop-in" might be
> a more straight forward approach?
>
> Best,
> Rong
>
> On Tue, Feb 27, 2018 at 8:23 AM, Stephan Ewen  wrote:
>
> > I think one problem with the "one fat jar for all" is that some
> > dependencies clash in the classnames across versions:
> >   - Kafka 0.9, 0.10, 0.11, 1.0
> >   - Elasticsearch 2, 4, and 5
> >
> > There are probably others as well...
> >
> > On Tue, Feb 27, 2018 at 2:57 PM, Timo Walther 
> wrote:
> >
> > > Hi Xingcan,
> > >
> > > thank you for your feedback. Regarding (3) we also thought about that
> but
> > > this approach would not scale very well. Given that we might have fat
> > jars
> > > for multiple versions (Kafka 0.8, Kafka 0.6 etc.) such an all-in-one
> > > solution JAR file might easily go beyond 1 or 2 GB. I don't know if
> users
> > > want to download that just for a combination of connector and format.
> > >
> > > Timo
> > >
> > >
> > > Am 2/27/18 um 2:16 PM schrieb Xingcan Cui:
> > >
> > > Hi Timo,
> > >>
> > >> thanks for your efforts. Personally, I think the second option would
> be
> > >> better and here are my feelings.
> > >>
> > >> (1) The SQL client is designed to offer a convenient way for users to
> > >> manipulate data with Flink. Obviously, the second option would be more
> > >> easy-to-use.
> > >>
> > >> (2) The script will help to manage the dependencies automatically, but
> > >> with less flexibility. Once the script cannot meet the need, users
> have
> > to
> > >> modify it themselves.
> > >>
> > >> (3) I wonder whether we could package all these built-in connectors
> and
> > >> formats into a single JAR. With this all-in-one solution, users don’t
> > need
> > >> to consider much about the dependencies.
> > >>
> > >> Best,
> > >> Xingcan
> > >>
> > >> On 27 Feb 2018, at 6:38 PM, Stephan Ewen  wrote:
> > >>>
> > >>> My first intuition would be to go for approach #2 for the following
> > >>> reasons
> > >>>
> > >>> - I expect that in the long run, the scripts will not be that simple
> to
> > >>> maintain. We saw that with all shell scripts thus far: they start
> > simple,
> > >>> and then grow with many special cases for this and that setup.
> > >>>
> > >>> - Not all users have Maven, automatically downloading and configuring
> > >>> Maven could be an option, but that makes the scripts yet more tricky.
> > >>>
> > >>> - Download-and-drop-in is probably still easier to understand for
> users
> > >>> than the syntax of a script with its parameters
> > >>>
> > >>> - I think it may actually be even simpler to maintain for us, because
> > all
> > >>> it does is add a profile or build target to each connector to also
> > create
> > >>> the fat jar.
> > >>>
> > >>> - Storage space is no longer really a problem. Worst case we host the
> > fat
> > >>> jars in an S3 bucket.
> > >>>
> > >>>
> > >>> On Mon, Feb 26, 2018 at 7:33 PM, Timo Walther 
> > >>> wrote:
> > >>>
> > >>> Hi everyone,
> > 
> >  as you may know a first minimum version of FLIP-24 [1] for the
> > upcoming
> >  Flink SQL Client has been merged to the master. We also merged
> >  possibilities to discover and configure table sources without a
> single
> >  line
> >  of code using string-based properties [2] and Java service provider
> >  discovery.
> > 
> >  We are now facing the issue of how to manage dependencies in this
> new
> >  environment. It is different from how regular Flink projects are
> > created
> >  (by setting up a a new Maven project and build a jar or fat jar).
> >  Ideally,
> >  a user should be able to select from a set of prepared connectors,
> >  catalogs, and formats. E.g., if a Kafka connector and Avro format is
> >  needed, all that should be required is to move a "flink-kafka.jar"
> and
> >  "flink-avro.jar" into the "sql_lib" directory that is shipped to a
> > Flink
> >  cluster together with the SQL query.
> > 
> >  The question is how do we want to offer those JAR files in the
> future?
> >  We
> >  see two options:
> > 
> >  1) We prepare Maven build profiles for all 

[jira] [Created] (FLINK-8820) FlinkKafkaConsumer010 reads too many bytes

2018-03-01 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8820:


 Summary: FlinkKafkaConsumer010 reads too many bytes
 Key: FLINK-8820
 URL: https://issues.apache.org/jira/browse/FLINK-8820
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: 1.4.0
Reporter: Fabian Hueske


A user reported that the FlinkKafkaConsumer010 very rarely consumes too many 
bytes, i.e., the returned message is too large. The application is running for 
about a year and the problem started to occur after upgrading to Flink 1.4.0.

The user made a good effort in debugging the problem but was not able to 
reproduce it in a controlled environment. It seems that the data is correctly 
stored in Kafka.

Here's the thread on the thread on the user mailing list for a detailed 
description of the problem and analysis so far: 
https://lists.apache.org/thread.html/1d62f616d275e9e23a5215ddf7f5466051be7ea96897d827232fcb4e@%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Releasing Flink 1.5.0

2018-03-01 Thread Till Rohrmann
Thanks for bringing this issue up Shashank. I think Aljoscha is taking a
look at the issue. It looks like a serious bug which we should definitely
fix. What I've heard so far is that it's not so trivial.

Cheers,
Till

On Thu, Mar 1, 2018 at 9:56 AM, shashank734  wrote:

> Can we have
> https://issues.apache.org/jira/browse/FLINK-7756
>    solved in this
> version.
> Cause unable to use checkpointing with CEP and RocksDB backend.
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>


Re: [DISCUSS] Releasing Flink 1.5.0

2018-03-01 Thread shashank734
Can we have 
https://issues.apache.org/jira/browse/FLINK-7756
   solved in this version.
Cause unable to use checkpointing with CEP and RocksDB backend.



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


[jira] [Created] (FLINK-8819) Rework travis script to use build stages

2018-03-01 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-8819:
---

 Summary: Rework travis script to use build stages
 Key: FLINK-8819
 URL: https://issues.apache.org/jira/browse/FLINK-8819
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Travis
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


This issue is for tracking efforts to rework our Travis scripts to use 
[stages|https://docs.travis-ci.com/user/build-stages/].

This feature allows us to define a sequence of jobs that are run one after 
another. This implies that we can define dependencies between jobs, in contrast 
to our existing jobs that have to be self-contained.

As an example, we could have a compile stage, and a test stage with multiple 
jobs.

The main benefit here is that we no longer have to compile modules multiple 
times, which would reduce our build times.

The major issue here however is that there is no _proper_ support for passing 
build-artifacts from one stage to the next. According to this 
[issue|https://github.com/travis-ci/beta-features/issues/28] it is on their 
to-do-list however.

In the mean-time we could manually transfer the artifacts between stages by 
either using the Travis cache or some other external storage.
 The major concern here is that of cleaning up the cache/storage.
 We can clean things up if
 * our script fails
 * the last stage succeeds.

We can *not* clean things up if
 * the build is canceled
 * travis fails the build due to a timeout or similar

as apparently there is [no way to run a script at the end of a 
build|https://github.com/travis-ci/travis-ci/issues/4221].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)