[jira] [Commented] (FLINK-12197) Avro row deser for Confluent binary format

2019-04-16 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818978#comment-16818978
 ] 

eugen yushin commented on FLINK-12197:
--

PR is available: [https://github.com/apache/flink/pull/8187]

> Avro row deser for Confluent binary format
> --
>
> Key: FLINK-12197
> URL: https://issues.apache.org/jira/browse/FLINK-12197
> Project: Flink
>  Issue Type: New Feature
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.8.0
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the following avro deserializators are available:
> - core avro binary layout for both Specific/Generic and Row
> - Confluent Specific/Generic record layout
> The intention of this task is to fill the gap and provide Confluent avro 
> deser for Row type.
> This will allow to harness Table API to pipelines that work with avro and 
> Confluent schema registry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12197) Avro row deser for Confluent binary format

2019-04-15 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-12197:
-
Description: 
Currently, the following avro deserializators are available:
- core avro binary layout for both Specific/Generic and Row
- Confluent Specific/Generic record layout

The intention of this task is to fill the gap and provide Confluent avro deser 
for Row type.
This will allow to harness Table API to pipelines that work with avro and 
Confluent schema registry.

  was:
Currently, the following avro deserializators are available:
- core avro binary layout for both Specific/Generic and Row
- Confluent Specific/Generic record layout
The intention of this task is to fill the gap and provide Confluent avro deser 
for Row type.
This will allow to harness Table API to pipelines that work with avro and 
Confluent schema registry.


> Avro row deser for Confluent binary format
> --
>
> Key: FLINK-12197
> URL: https://issues.apache.org/jira/browse/FLINK-12197
> Project: Flink
>  Issue Type: New Feature
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.8.0
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>
> Currently, the following avro deserializators are available:
> - core avro binary layout for both Specific/Generic and Row
> - Confluent Specific/Generic record layout
> The intention of this task is to fill the gap and provide Confluent avro 
> deser for Row type.
> This will allow to harness Table API to pipelines that work with avro and 
> Confluent schema registry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12197) Avro row deser for Confluent binary format

2019-04-15 Thread eugen yushin (JIRA)
eugen yushin created FLINK-12197:


 Summary: Avro row deser for Confluent binary format
 Key: FLINK-12197
 URL: https://issues.apache.org/jira/browse/FLINK-12197
 Project: Flink
  Issue Type: New Feature
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.8.0
Reporter: eugen yushin
Assignee: eugen yushin


Currently, the following avro deserializators are available:
- core avro binary layout for both Specific/Generic and Row
- Confluent Specific/Generic record layout
The intention of this task is to fill the gap and provide Confluent avro deser 
for Row type.
This will allow to harness Table API to pipelines that work with avro and 
Confluent schema registry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10496) CommandLineParser arguments interleaving

2019-02-05 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10496:
-
Description: 
*Business case:*
Run Flink job cluster within Docker/k8s. Job takes an argument called 
`--config` which can't be recognized in runtime.

{code:java}
Caused by: java.lang.RuntimeException: No data for required key 'config'
{code}

*Problem statement:*
Command line parser can't recognize job specific arguments when they have the 
same prefix as Flink's ones.

e.g.
[https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]

the following args results in failure:
{code:java}
final String arg1 = "--config";
final String arg2 = "/path/to/job.yaml";{code}

*Reason*:
Apache CLI parser use string prefix matching to parse options and adds extra 
--configDir to result set instead of adding new --config.
https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391

*Scope*:
Update commons-cli dependency with version 1.5-SNAPSHOT which has flag to 
disable partial matching.
https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342

Update Flink's command line parser to utilize this feature.
https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45

  was:
*Business case:*
Run Flink job cluster within Docker/k8s. Job takes an argument called 
`--config` which can't be recognized in runtime.

{code:java}
Caused by: java.lang.RuntimeException: No data for required key 'config'
{code}

*Problem statement:*
Command line parser can't recognize job specific arguments when they have the 
same prefix as Flink's ones.

e.g.
[https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]

the following args results in failure:
{code:java}
final String arg1 = "--config";
final String arg2 = "/path/to/job.yaml";{code}

*Reason*:
Apache CLI parser use string prefix matching to parse options and adds extra 
--configDir to result set instead of adding new --config.
https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391

*Scope*:
Update commons-cli dependency with version 1.4 which has flag to disable 
partial matching.
https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342

Update Flink's command line parser to utilize this feature.
https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45


> CommandLineParser arguments interleaving
> 
>
> Key: FLINK-10496
> URL: https://issues.apache.org/jira/browse/FLINK-10496
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Core, Docker, Java API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Business case:*
> Run Flink job cluster within Docker/k8s. Job takes an argument called 
> `--config` which can't be recognized in runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: No data for required key 'config'
> {code}
> *Problem statement:*
> Command line parser can't recognize job specific arguments when they have the 
> same prefix as Flink's ones.
> e.g.
> [https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]
> the following args results in failure:
> {code:java}
> final String arg1 = "--config";
> final String arg2 = "/path/to/job.yaml";{code}
> *Reason*:
> Apache CLI parser use string prefix matching to parse options and adds extra 
> --configDir to result set instead of adding new --config.
> https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391
> *Scope*:
> Update commons-cli dependency with version 1.5-SNAPSHOT which has flag to 
> disable partial matching.
> https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342
> Update Flink's command line parser to utilize this feature.
> https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10422) Follow AWS specs in Kinesis Consumer

2019-02-04 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10422:
-
Affects Version/s: 1.7.1

> Follow AWS specs in Kinesis Consumer 
> -
>
> Key: FLINK-10422
> URL: https://issues.apache.org/jira/browse/FLINK-10422
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.6.1, 1.7.1
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Related conversation in mailing list:*
> [https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]
> *Summary:*
> Flink Kinesis consumer checks shards id for a particular pattern:
> {noformat}
> "^shardId-\\d{12}"
> {noformat}
> [https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]
> While this inlines with current Kinesis streams server implementation (all 
> streams follows this pattern), it confronts with AWS docs:
>  
> {code:java}
> ShardId
>  The unique identifier of the shard within the stream.
>  Type: String
>  Length Constraints: Minimum length of 1. Maximum length of 128.
> Pattern: [a-zA-Z0-9_.-]+
>  Required: Yes
> {code}
>  
> [https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]
> *Intention:*
>  We have no guarantees and can't rely on patterns other than provided in AWS 
> manifest.
>  Any custom implementation of Kinesis mock should rely on AWS manifest which 
> claims ShardID to be alfanums. This prevents anyone to use Flink with such 
> kind of mocks.
> The reason behind the scene to use particular pattern "^shardId-d12" is to 
> create Flink's custom Shard comparator, filter already seen shards, and pass 
> latest shard for client.listShards only to limit the scope for RPC call to 
> AWS.
> In the meantime, I think we can get rid of this logic at all. The current 
> usage in project is:
>  - fix Kinesalite bug (I've already opened an issue to cover this:
>  [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
> [https://github.com/mhart/kinesalite/pull/77]). We can move this logic to 
> test code base to keep production code clean for now
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]
>  - adjust last seen shard id. We can simply omit this cause' AWS client won't 
> return already seen shards and we will have new ids only or nothing.
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L475]
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L406]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10167) SessionWindows not compatible with typed DataStreams in scala

2019-01-31 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10167:
-
Affects Version/s: 1.7.1

> SessionWindows not compatible with typed DataStreams in scala
> -
>
> Key: FLINK-10167
> URL: https://issues.apache.org/jira/browse/FLINK-10167
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andrew Roberts
>Priority: Major
>
> I'm trying to construct a trivial job that uses session windows, and it looks 
> like the data type parameter is hardcoded to `Object`/`AnyRef`. Due to the 
> invariance of java classes in scala, this means that we can't use the 
> provided SessionWindow helper classes in scala on typed streams.
>  
> Example job:
> {code:java}
> import org.apache.flink.api.scala._
> import org.apache.flink.streaming.api.scala.{DataStream, 
> StreamExecutionEnvironment}
> import 
> org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows
> import org.apache.flink.streaming.api.windowing.time.Time
> import org.apache.flink.streaming.api.windowing.windows.{TimeWindow, Window}
> import org.apache.flink.util.Collector
> object TestJob {
>   val jobName = "TestJob"
>   def main(args: Array[String]): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.fromCollection(Range(0, 100).toList)
>   .keyBy(_ / 10)
>   .window(ProcessingTimeSessionWindows.withGap(Time.minutes(1)))
>   .reduce(
> (a: Int, b: Int) => a + b,
> (key: Int, window: Window, items: Iterable[Int], out: 
> Collector[String]) => s"${key}: ${items}"
>   )
>   .map(println(_))
> env.execute(jobName)
>   }
> }{code}
>  
> Compile error:
> {code:java}
> [error]  found   : 
> org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows
> [error]  required: 
> org.apache.flink.streaming.api.windowing.assigners.WindowAssigner[_ >: Int, ?]
> [error] Note: Object <: Any (and 
> org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows
>  <: 
> org.apache.flink.streaming.api.windowing.assigners.MergingWindowAssigner[Object,org.apache.flink.streaming.api.windowing.windows.TimeWindow]),
>  but Java-defined class WindowAssigner is invariant in type T.
> [error] You may wish to investigate a wildcard type such as `_ <: Any`. (SLS 
> 3.2.10)
> [error]       
> .window(ProcessingTimeSessionWindows.withGap(Time.minutes(1))){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11201) flink-test-utils dependency issue

2018-12-21 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726719#comment-16726719
 ] 

eugen yushin commented on FLINK-11201:
--

thx guys for a quick turnaround

> flink-test-utils dependency issue
> -
>
> Key: FLINK-11201
> URL: https://issues.apache.org/jira/browse/FLINK-11201
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: eugen yushin
>Assignee: Till Rohrmann
>Priority: Major
> Fix For: 1.7.2, 1.8.0
>
>
> Starting with Flink 1.7, there's lack of 
> `runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
> distribution.
> Steps to reproduce (Scala code)
> build.sbt
> {code}
> name := "flink-17-test-issue"
> organization := "x.y.z"
> scalaVersion := "2.11.12"
> val flinkVersion = "1.7.0"
> libraryDependencies ++= Seq(
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
>   "org.scalatest" %% "scalatest" % "3.0.5" % Test,
>   "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
> //  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
> Artifact.TestsClassifier
> )
> {code}
> test class:
> {code}
> class SimpleTest extends AbstractTestBase with FlatSpecLike {
>   implicit val env: StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   env.setParallelism(1)
>   env.setRestartStrategy(RestartStrategies.noRestart())
>   "SimpleTest" should "work" in {
> val inputDs = env.fromElements(1,2,3)
> inputDs.print()
> env.execute()
>   }
> }
> {code}
> Results in:
> {code}
> A needed class was not found. This could be due to an error in your runpath. 
> Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
> java.lang.NoClassDefFoundError: 
> org/apache/flink/runtime/testutils/MiniClusterResource
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.runtime.testutils.MiniClusterResource
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 31 more
> {code}
> This can be fixed by adding flink-runtime distribution with test classifier 
> into dependencies list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11201) flink-test-utils dependency issue

2018-12-19 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725130#comment-16725130
 ] 

eugen yushin commented on FLINK-11201:
--

at the same time, looks like to sbt issue mentioned in:
Transitive dependencies with classifier "test" are not include in the classpath 
#2964
https://github.com/sbt/sbt/issues/2964

Nevertheless, I think it makes sense to add a note into docs to point out this 
explicitly. So nobody will spend extra time seeking for this behavior 
explanation in future.

> flink-test-utils dependency issue
> -
>
> Key: FLINK-11201
> URL: https://issues.apache.org/jira/browse/FLINK-11201
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: eugen yushin
>Priority: Major
>
> Starting with Flink 1.7, there's lack of 
> `runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
> distribution.
> Steps to reproduce (Scala code)
> build.sbt
> {code}
> name := "flink-17-test-issue"
> organization := "x.y.z"
> scalaVersion := "2.11.12"
> val flinkVersion = "1.7.0"
> libraryDependencies ++= Seq(
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
>   "org.scalatest" %% "scalatest" % "3.0.5" % Test,
>   "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
> //  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
> Artifact.TestsClassifier
> )
> {code}
> test class:
> {code}
> class SimpleTest extends AbstractTestBase with FlatSpecLike {
>   implicit val env: StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   env.setParallelism(1)
>   env.setRestartStrategy(RestartStrategies.noRestart())
>   "SimpleTest" should "work" in {
> val inputDs = env.fromElements(1,2,3)
> inputDs.print()
> env.execute()
>   }
> }
> {code}
> Results in:
> {code}
> A needed class was not found. This could be due to an error in your runpath. 
> Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
> java.lang.NoClassDefFoundError: 
> org/apache/flink/runtime/testutils/MiniClusterResource
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.runtime.testutils.MiniClusterResource
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 31 more
> {code}
> This can be fixed by adding flink-runtime distribution with test classifier 
> into dependencies list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11201) flink-test-utils dependency issue

2018-12-19 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725115#comment-16725115
 ] 

eugen yushin commented on FLINK-11201:
--

It's a behavior change which is not reflected in docs.
If code from the description works fine on your env, I have no questions. But 
it doesn't on 2 dev's machines I have.

> flink-test-utils dependency issue
> -
>
> Key: FLINK-11201
> URL: https://issues.apache.org/jira/browse/FLINK-11201
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: eugen yushin
>Priority: Major
>
> Starting with Flink 1.7, there's lack of 
> `runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
> distribution.
> Steps to reproduce (Scala code)
> build.sbt
> {code}
> name := "flink-17-test-issue"
> organization := "x.y.z"
> scalaVersion := "2.11.12"
> val flinkVersion = "1.7.0"
> libraryDependencies ++= Seq(
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
>   "org.scalatest" %% "scalatest" % "3.0.5" % Test,
>   "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
> //  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
> Artifact.TestsClassifier
> )
> {code}
> test class:
> {code}
> class SimpleTest extends AbstractTestBase with FlatSpecLike {
>   implicit val env: StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   env.setParallelism(1)
>   env.setRestartStrategy(RestartStrategies.noRestart())
>   "SimpleTest" should "work" in {
> val inputDs = env.fromElements(1,2,3)
> inputDs.print()
> env.execute()
>   }
> }
> {code}
> Results in:
> {code}
> A needed class was not found. This could be due to an error in your runpath. 
> Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
> java.lang.NoClassDefFoundError: 
> org/apache/flink/runtime/testutils/MiniClusterResource
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.runtime.testutils.MiniClusterResource
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 31 more
> {code}
> This can be fixed by adding flink-runtime distribution with test classifier 
> into dependencies list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11201) flink-test-utils dependency issue

2018-12-19 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724931#comment-16724931
 ] 

eugen yushin commented on FLINK-11201:
--

it works like a charm in 1.6, so I think something has been changed in 1.7 
version regarding dependency management for flink-test-utils.
btw, class from an error 
`org.apache.flink.runtime.testutils.MiniClusterResource` comes from test 
sources of flink-runtime, not the source code.

> flink-test-utils dependency issue
> -
>
> Key: FLINK-11201
> URL: https://issues.apache.org/jira/browse/FLINK-11201
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: eugen yushin
>Priority: Major
>
> Starting with Flink 1.7, there's lack of 
> `runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
> distribution.
> Steps to reproduce (Scala code)
> build.sbt
> {code}
> name := "flink-17-test-issue"
> organization := "x.y.z"
> scalaVersion := "2.11.12"
> val flinkVersion = "1.7.0"
> libraryDependencies ++= Seq(
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
>   "org.scalatest" %% "scalatest" % "3.0.5" % Test,
>   "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
> //  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
> Artifact.TestsClassifier
> )
> {code}
> test class:
> {code}
> class SimpleTest extends AbstractTestBase with FlatSpecLike {
>   implicit val env: StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   env.setParallelism(1)
>   env.setRestartStrategy(RestartStrategies.noRestart())
>   "SimpleTest" should "work" in {
> val inputDs = env.fromElements(1,2,3)
> inputDs.print()
> env.execute()
>   }
> }
> {code}
> Results in:
> {code}
> A needed class was not found. This could be due to an error in your runpath. 
> Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
> java.lang.NoClassDefFoundError: 
> org/apache/flink/runtime/testutils/MiniClusterResource
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.runtime.testutils.MiniClusterResource
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 31 more
> {code}
> This can be fixed by adding flink-runtime distribution with test classifier 
> into dependencies list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11201) flink-test-utils dependency issue

2018-12-19 Thread eugen yushin (JIRA)
eugen yushin created FLINK-11201:


 Summary: flink-test-utils dependency issue
 Key: FLINK-11201
 URL: https://issues.apache.org/jira/browse/FLINK-11201
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.7.0
Reporter: eugen yushin


Starting with Flink 1.7, there's lack of 
`runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
distribution.

Steps to reproduce (Scala code)

build.sbt
{code}
name := "flink-17-test-issue"

organization := "x.y.z"
scalaVersion := "2.11.12"
val flinkVersion = "1.7.0"

libraryDependencies ++= Seq(
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
  "org.scalatest" %% "scalatest" % "3.0.5" % Test,
  "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
//  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
Artifact.TestsClassifier
)
{code}

test class:
{code}
class SimpleTest extends AbstractTestBase with FlatSpecLike {
  implicit val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment
  env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
  env.setParallelism(1)
  env.setRestartStrategy(RestartStrategies.noRestart())

  "SimpleTest" should "work" in {
val inputDs = env.fromElements(1,2,3)

inputDs.print()

env.execute()
  }
}
{code}

Results in:
{code}
A needed class was not found. This could be due to an error in your runpath. 
Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
java.lang.NoClassDefFoundError: 
org/apache/flink/runtime/testutils/MiniClusterResource
...
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.runtime.testutils.MiniClusterResource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 31 more
{code}

This can be fixed by flink-runtime distribution with test classifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11201) flink-test-utils dependency issue

2018-12-19 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-11201:
-
Description: 
Starting with Flink 1.7, there's lack of 
`runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
distribution.

Steps to reproduce (Scala code)

build.sbt
{code}
name := "flink-17-test-issue"

organization := "x.y.z"
scalaVersion := "2.11.12"
val flinkVersion = "1.7.0"

libraryDependencies ++= Seq(
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
  "org.scalatest" %% "scalatest" % "3.0.5" % Test,
  "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
//  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
Artifact.TestsClassifier
)
{code}

test class:
{code}
class SimpleTest extends AbstractTestBase with FlatSpecLike {
  implicit val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment
  env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
  env.setParallelism(1)
  env.setRestartStrategy(RestartStrategies.noRestart())

  "SimpleTest" should "work" in {
val inputDs = env.fromElements(1,2,3)

inputDs.print()

env.execute()
  }
}
{code}

Results in:
{code}
A needed class was not found. This could be due to an error in your runpath. 
Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
java.lang.NoClassDefFoundError: 
org/apache/flink/runtime/testutils/MiniClusterResource
...
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.runtime.testutils.MiniClusterResource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 31 more
{code}

This can be fixed by adding flink-runtime distribution with test classifier 
into dependencies list.

  was:
Starting with Flink 1.7, there's lack of 
`runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
distribution.

Steps to reproduce (Scala code)

build.sbt
{code}
name := "flink-17-test-issue"

organization := "x.y.z"
scalaVersion := "2.11.12"
val flinkVersion = "1.7.0"

libraryDependencies ++= Seq(
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
  "org.scalatest" %% "scalatest" % "3.0.5" % Test,
  "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
//  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
Artifact.TestsClassifier
)
{code}

test class:
{code}
class SimpleTest extends AbstractTestBase with FlatSpecLike {
  implicit val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment
  env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
  env.setParallelism(1)
  env.setRestartStrategy(RestartStrategies.noRestart())

  "SimpleTest" should "work" in {
val inputDs = env.fromElements(1,2,3)

inputDs.print()

env.execute()
  }
}
{code}

Results in:
{code}
A needed class was not found. This could be due to an error in your runpath. 
Missing class: org/apache/flink/runtime/testutils/MiniClusterResource
java.lang.NoClassDefFoundError: 
org/apache/flink/runtime/testutils/MiniClusterResource
...
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.runtime.testutils.MiniClusterResource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 31 more
{code}

This can be fixed by flink-runtime distribution with test classifier.


> flink-test-utils dependency issue
> -
>
> Key: FLINK-11201
> URL: https://issues.apache.org/jira/browse/FLINK-11201
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: eugen yushin
>Priority: Major
>
> Starting with Flink 1.7, there's lack of 
> `runtime.testutils.MiniClusterResource` class in `flink-test-utils` 
> distribution.
> Steps to reproduce (Scala code)
> build.sbt
> {code}
> name := "flink-17-test-issue"
> organization := "x.y.z"
> scalaVersion := "2.11.12"
> val flinkVersion = "1.7.0"
> libraryDependencies ++= Seq(
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % Provided,
>   "org.scalatest" %% "scalatest" % "3.0.5" % Test,
>   "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test
> //  ,"org.apache.flink" %% "flink-runtime" % flinkVersion % Test classifier 
> Artifact.TestsClassifier
> )
> {code}
> test class:
> {code}
> class SimpleTest extends AbstractTestBase with FlatSpecLike {
>   implicit val env: 

[jira] [Commented] (FLINK-5833) Support for Hive GenericUDF

2018-11-21 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694749#comment-16694749
 ] 

eugen yushin commented on FLINK-5833:
-

Guys, are there any updates on this?

> Support for Hive GenericUDF
> ---
>
> Key: FLINK-5833
> URL: https://issues.apache.org/jira/browse/FLINK-5833
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Reporter: Zhuoluo Yang
>Assignee: Zhuoluo Yang
>Priority: Major
>
> The second step of FLINK-5802 is to support Hive's GenericUDF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9544) Downgrade kinesis protocol from CBOR to JSON not possible as required by kinesalite

2018-10-12 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647634#comment-16647634
 ] 

eugen yushin edited comment on FLINK-9544 at 10/12/18 8:19 AM:
---

[~diego.carvallo.rl] can you post a stack trace here? while 
`System.setProperty` works well via IDE, it doesn't work on real cluster as for 
me. While putting `env.java.opts` works like a charm with flink vanilla version.
Did you restart your cluster after updating flink-conf file?


was (Author: eyushin):
[~diego.carvallo.rl] can you post a stack trace here? while 
`System.setProperty` works well via IDE, it doesn't work on real cluster as for 
me. While putting `env.java.opts` works like a charm with flink vanilla version.

> Downgrade kinesis protocol from CBOR to JSON not possible as required by 
> kinesalite
> ---
>
> Key: FLINK-9544
> URL: https://issues.apache.org/jira/browse/FLINK-9544
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.5.0, 1.4.1, 1.4.2
>Reporter: Ph.Duveau
>Priority: Critical
>
> The amazon client do not downgrade from CBOR to JSON while setting env 
> AWS_CBOR_DISABLE to true (or 1) and/or defining 
> com.amazonaws.sdk.disableCbor=true via JVM options. This bug is due to maven 
> shade relocation of com.amazon.* classes. As soon as you cancel this 
> relocation (by removing the relocation in the kinesis connector or by 
> re-relocating in the final jar), it reruns again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9544) Downgrade kinesis protocol from CBOR to JSON not possible as required by kinesalite

2018-10-12 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647634#comment-16647634
 ] 

eugen yushin commented on FLINK-9544:
-

[~diego.carvallo.rl] can you post a stack trace here? while 
`System.setProperty` works well via IDE, it doesn't work on real cluster as for 
me. While putting `env.java.opts` works like a charm with flink vanilla version.

> Downgrade kinesis protocol from CBOR to JSON not possible as required by 
> kinesalite
> ---
>
> Key: FLINK-9544
> URL: https://issues.apache.org/jira/browse/FLINK-9544
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.5.0, 1.4.1, 1.4.2
>Reporter: Ph.Duveau
>Priority: Critical
>
> The amazon client do not downgrade from CBOR to JSON while setting env 
> AWS_CBOR_DISABLE to true (or 1) and/or defining 
> com.amazonaws.sdk.disableCbor=true via JVM options. This bug is due to maven 
> shade relocation of com.amazon.* classes. As soon as you cancel this 
> relocation (by removing the relocation in the kinesis connector or by 
> re-relocating in the final jar), it reruns again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9544) Downgrade kinesis protocol from CBOR to JSON not possible as required by kinesalite

2018-10-10 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644804#comment-16644804
 ] 

eugen yushin commented on FLINK-9544:
-

another option is to set

{code:java}
# flink-conf.yaml:
env.java.opts: "-Dorg.apache.flink.kinesis.shaded.com.amazonaws.sdk.disableCbor"

{code}


> Downgrade kinesis protocol from CBOR to JSON not possible as required by 
> kinesalite
> ---
>
> Key: FLINK-9544
> URL: https://issues.apache.org/jira/browse/FLINK-9544
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.5.0, 1.4.1, 1.4.2
>Reporter: Ph.Duveau
>Priority: Critical
>
> The amazon client do not downgrade from CBOR to JSON while setting env 
> AWS_CBOR_DISABLE to true (or 1) and/or defining 
> com.amazonaws.sdk.disableCbor=true via JVM options. This bug is due to maven 
> shade relocation of com.amazon.* classes. As soon as you cancel this 
> relocation (by removing the relocation in the kinesis connector or by 
> re-relocating in the final jar), it reruns again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10496) CommandLineParser arguments interleaving

2018-10-05 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10496:
-
Issue Type: Bug  (was: Improvement)

> CommandLineParser arguments interleaving
> 
>
> Key: FLINK-10496
> URL: https://issues.apache.org/jira/browse/FLINK-10496
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Core, Docker, Java API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Business case:*
> Run Flink job cluster within Docker/k8s. Job takes an argument called 
> `--config` which can't be recognized in runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: No data for required key 'config'
> {code}
> *Problem statement:*
> Command line parser can't recognize job specific arguments when they have the 
> same prefix as Flink's ones.
> e.g.
> [https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]
> the following args results in failure:
> {code:java}
> final String arg1 = "--config";
> final String arg2 = "/path/to/job.yaml";{code}
> *Reason*:
> Apache CLI parser use string prefix matching to parse options and adds extra 
> --configDir to result set instead of adding new --config.
> https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391
> *Scope*:
> Update commons-cli dependency with version 1.4 which has flag to disable 
> partial matching.
> https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342
> Update Flink's command line parser to utilize this feature.
> https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10496) CommandLineParser arguments interleaving

2018-10-05 Thread eugen yushin (JIRA)
eugen yushin created FLINK-10496:


 Summary: CommandLineParser arguments interleaving
 Key: FLINK-10496
 URL: https://issues.apache.org/jira/browse/FLINK-10496
 Project: Flink
  Issue Type: Improvement
  Components: Configuration, Core, Java API
Affects Versions: 1.6.1, 1.7.0
Reporter: eugen yushin
Assignee: eugen yushin


*Business case:*
Run Flink job cluster within Docker/k8s. Job takes an argument called 
`--config` which can't be recognized in runtime.

{code:java}
Caused by: java.lang.RuntimeException: No data for required key 'config'
{code}

*Problem statement:*
Command line parser can't recognize job specific arguments when they have the 
same prefix as Flink's ones.

e.g.
[https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]

the following args results in failure:
{code:java}
final String arg1 = "--config";
final String arg2 = "/path/to/job.yaml";{code}

*Reason*:
Apache CLI parser use string prefix matching to parse options and adds extra 
--configDir to result set instead of adding new --config.
https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391

*Scope*:
Update commons-cli dependency with version 1.4 which has flag to disable 
partial matching.
https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342

Update Flink's command line parser to utilize this feature.
https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10496) CommandLineParser arguments interleaving

2018-10-05 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10496:
-
Labels: pull-request-available  (was: )

> CommandLineParser arguments interleaving
> 
>
> Key: FLINK-10496
> URL: https://issues.apache.org/jira/browse/FLINK-10496
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration, Core, Docker, Java API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Business case:*
> Run Flink job cluster within Docker/k8s. Job takes an argument called 
> `--config` which can't be recognized in runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: No data for required key 'config'
> {code}
> *Problem statement:*
> Command line parser can't recognize job specific arguments when they have the 
> same prefix as Flink's ones.
> e.g.
> [https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]
> the following args results in failure:
> {code:java}
> final String arg1 = "--config";
> final String arg2 = "/path/to/job.yaml";{code}
> *Reason*:
> Apache CLI parser use string prefix matching to parse options and adds extra 
> --configDir to result set instead of adding new --config.
> https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391
> *Scope*:
> Update commons-cli dependency with version 1.4 which has flag to disable 
> partial matching.
> https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342
> Update Flink's command line parser to utilize this feature.
> https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10496) CommandLineParser arguments interleaving

2018-10-05 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10496:
-
Component/s: Docker

> CommandLineParser arguments interleaving
> 
>
> Key: FLINK-10496
> URL: https://issues.apache.org/jira/browse/FLINK-10496
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration, Core, Docker, Java API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Business case:*
> Run Flink job cluster within Docker/k8s. Job takes an argument called 
> `--config` which can't be recognized in runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: No data for required key 'config'
> {code}
> *Problem statement:*
> Command line parser can't recognize job specific arguments when they have the 
> same prefix as Flink's ones.
> e.g.
> [https://github.com/apache/flink/blob/master/flink-container/src/test/java/org/apache/flink/container/entrypoint/StandaloneJobClusterConfigurationParserFactoryTest.java#L52]
> the following args results in failure:
> {code:java}
> final String arg1 = "--config";
> final String arg2 = "/path/to/job.yaml";{code}
> *Reason*:
> Apache CLI parser use string prefix matching to parse options and adds extra 
> --configDir to result set instead of adding new --config.
> https://github.com/apache/commons-cli/blob/cli-1.3.1/src/main/java/org/apache/commons/cli/DefaultParser.java#L391
> *Scope*:
> Update commons-cli dependency with version 1.4 which has flag to disable 
> partial matching.
> https://github.com/apache/commons-cli/commit/bdb4a09ceaceab7e3d214b1beadb93bd9c911342
> Update Flink's command line parser to utilize this feature.
> https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L45



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10422) Follow AWS specs in Kinesis Consumer

2018-10-03 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636775#comment-16636775
 ] 

eugen yushin commented on FLINK-10422:
--

[~tzulitai] looks like you've contributed to Kinesis connector a much, can you 
please check proposed changes or let me know who is the better person to ask 
for a Code Review?

Regards

> Follow AWS specs in Kinesis Consumer 
> -
>
> Key: FLINK-10422
> URL: https://issues.apache.org/jira/browse/FLINK-10422
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.6.1
>Reporter: eugen yushin
>Assignee: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Related conversation in mailing list:*
> [https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]
> *Summary:*
> Flink Kinesis consumer checks shards id for a particular pattern:
> {noformat}
> "^shardId-\\d{12}"
> {noformat}
> [https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]
> While this inlines with current Kinesis streams server implementation (all 
> streams follows this pattern), it confronts with AWS docs:
>  
> {code:java}
> ShardId
>  The unique identifier of the shard within the stream.
>  Type: String
>  Length Constraints: Minimum length of 1. Maximum length of 128.
> Pattern: [a-zA-Z0-9_.-]+
>  Required: Yes
> {code}
>  
> [https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]
> *Intention:*
>  We have no guarantees and can't rely on patterns other than provided in AWS 
> manifest.
>  Any custom implementation of Kinesis mock should rely on AWS manifest which 
> claims ShardID to be alfanums. This prevents anyone to use Flink with such 
> kind of mocks.
> The reason behind the scene to use particular pattern "^shardId-d12" is to 
> create Flink's custom Shard comparator, filter already seen shards, and pass 
> latest shard for client.listShards only to limit the scope for RPC call to 
> AWS.
> In the meantime, I think we can get rid of this logic at all. The current 
> usage in project is:
>  - fix Kinesalite bug (I've already opened an issue to cover this:
>  [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
> [https://github.com/mhart/kinesalite/pull/77]). We can move this logic to 
> test code base to keep production code clean for now
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]
>  - adjust last seen shard id. We can simply omit this cause' AWS client won't 
> return already seen shards and we will have new ids only or nothing.
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L475]
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L406]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10422) Follow AWS specs in Kinesis Consumer

2018-09-25 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10422:
-
Description: 
*Related conversation in mailing list:*

[https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]

*Summary:*

Flink Kinesis consumer checks shards id for a particular pattern:
{noformat}
"^shardId-\\d{12}"
{noformat}
[https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]

While this inlines with current Kinesis streams server implementation (all 
streams follows this pattern), it confronts with AWS docs:

 
{code:java}
ShardId
 The unique identifier of the shard within the stream.
 Type: String
 Length Constraints: Minimum length of 1. Maximum length of 128.
Pattern: [a-zA-Z0-9_.-]+
 Required: Yes
{code}
 

[https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]

*Intention:*
 We have no guarantees and can't rely on patterns other than provided in AWS 
manifest.
 Any custom implementation of Kinesis mock should rely on AWS manifest which 
claims ShardID to be alfanums. This prevents anyone to use Flink with such kind 
of mocks.

The reason behind the scene to use particular pattern "^shardId-d12" is to 
create Flink's custom Shard comparator, filter already seen shards, and pass 
latest shard for client.listShards only to limit the scope for RPC call to AWS.

In the meantime, I think we can get rid of this logic at all. The current usage 
in project is:
 - fix Kinesalite bug (I've already opened an issue to cover this:
 [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
[https://github.com/mhart/kinesalite/pull/77]). We can move this logic to test 
code base to keep production code clean for now
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]

 - adjust last seen shard id. We can simply omit this cause' AWS client won't 
return already seen shards and we will have new ids only or nothing.
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L475]
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L406]

  was:
*Related conversation in mailing list:*

[https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]

*Summary:*

Flink Kinesis consumer checks shards id for a particular pattern:
{noformat}
"^shardId-\\d{12}"
{noformat}
[https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]

While this inlines with current Kinesis streams server implementation (all
 streams follows this pattern), it confronts with AWS docs:

 
{code:java}
ShardId
 The unique identifier of the shard within the stream.
 Type: String
 Length Constraints: Minimum length of 1. Maximum length of 128.
Pattern: [a-zA-Z0-9_.-]+
 Required: Yes
{code}
 

[https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]

*Intention:*
 We have no guarantees and can't rely on patterns other than provided in AWS
 manifest.
 Any custom implementation of Kinesis mock should rely on AWS manifest which
 claims ShardID to be alfanums. This prevents anyone to use Flink with such
 kind of mocks.

The reason behind the scene to use particular pattern "^shardId-
d12" is to create Flink's custom Shard comparator, filter already seen shards, 
and
 pass latest shard for client.listShards only to limit the scope for RPC
 call to AWS.

In the meantime, I think we can get rid of this logic at all. The current
 usage in project is:
 - fix Kinesalite bug (I've already opened an issue to cover this:
 [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
[https://github.com/mhart/kinesalite/pull/77]). We can move this logic to
 test code base to keep production code clean for now
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]

 - adjust last seen shard id. We can simply omit this cause' AWS client
 won't return already seen shards and we will have new ids only or nothing.
 

[jira] [Updated] (FLINK-10422) Follow AWS specs in Kinesis Consumer

2018-09-25 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10422:
-
Labels: pull-request-available  (was: )

> Follow AWS specs in Kinesis Consumer 
> -
>
> Key: FLINK-10422
> URL: https://issues.apache.org/jira/browse/FLINK-10422
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.6.1
>Reporter: eugen yushin
>Priority: Major
>  Labels: pull-request-available
>
> *Related conversation in mailing list:*
> [https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]
> *Summary:*
> Flink Kinesis consumer checks shards id for a particular pattern:
> {noformat}
> "^shardId-\\d{12}"
> {noformat}
> [https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]
> While this inlines with current Kinesis streams server implementation (all
>  streams follows this pattern), it confronts with AWS docs:
>  
> {code:java}
> ShardId
>  The unique identifier of the shard within the stream.
>  Type: String
>  Length Constraints: Minimum length of 1. Maximum length of 128.
> Pattern: [a-zA-Z0-9_.-]+
>  Required: Yes
> {code}
>  
> [https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]
> *Intention:*
>  We have no guarantees and can't rely on patterns other than provided in AWS
>  manifest.
>  Any custom implementation of Kinesis mock should rely on AWS manifest which
>  claims ShardID to be alfanums. This prevents anyone to use Flink with such
>  kind of mocks.
> The reason behind the scene to use particular pattern "^shardId-
> d12" is to create Flink's custom Shard comparator, filter already seen 
> shards, and
>  pass latest shard for client.listShards only to limit the scope for RPC
>  call to AWS.
> In the meantime, I think we can get rid of this logic at all. The current
>  usage in project is:
>  - fix Kinesalite bug (I've already opened an issue to cover this:
>  [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
> [https://github.com/mhart/kinesalite/pull/77]). We can move this logic to
>  test code base to keep production code clean for now
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]
>  - adjust last seen shard id. We can simply omit this cause' AWS client
>  won't return already seen shards and we will have new ids only or nothing.
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L475]
>  
> [https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L406]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10422) Follow AWS specs in Kinesis Consumer

2018-09-25 Thread eugen yushin (JIRA)
eugen yushin created FLINK-10422:


 Summary: Follow AWS specs in Kinesis Consumer 
 Key: FLINK-10422
 URL: https://issues.apache.org/jira/browse/FLINK-10422
 Project: Flink
  Issue Type: Improvement
  Components: Kinesis Connector
Affects Versions: 1.6.1
Reporter: eugen yushin


*Related conversation in mailing list:*

[https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]

*Summary:*

Flink Kinesis consumer checks shards id for a particular pattern:
{noformat}
"^shardId-\\d{12}"
{noformat}
[https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]

While this inlines with current Kinesis streams server implementation (all
 streams follows this pattern), it confronts with AWS docs:

 
{code:java}
ShardId
 The unique identifier of the shard within the stream.
 Type: String
 Length Constraints: Minimum length of 1. Maximum length of 128.
Pattern: [a-zA-Z0-9_.-]+
 Required: Yes
{code}
 

[https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]

*Intention:*
 We have no guarantees and can't rely on patterns other than provided in AWS
 manifest.
 Any custom implementation of Kinesis mock should rely on AWS manifest which
 claims ShardID to be alfanums. This prevents anyone to use Flink with such
 kind of mocks.

The reason behind the scene to use particular pattern "^shardId-
d12" is to create Flink's custom Shard comparator, filter already seen shards, 
and
 pass latest shard for client.listShards only to limit the scope for RPC
 call to AWS.

In the meantime, I think we can get rid of this logic at all. The current
 usage in project is:
 - fix Kinesalite bug (I've already opened an issue to cover this:
 [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
[https://github.com/mhart/kinesalite/pull/77]). We can move this logic to
 test code base to keep production code clean for now
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]

 - adjust last seen shard id. We can simply omit this cause' AWS client
 won't return already seen shards and we will have new ids only or nothing.
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L475]
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L406]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9544) Downgrade kinesis protocol from CBOR to JSON not possible as required by kinesalite

2018-09-13 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613142#comment-16613142
 ] 

eugen yushin commented on FLINK-9544:
-

The following code snippet works for me with Flink 1.6:

{code:java}
import org.apache.flink.kinesis.shaded.com.amazonaws.SDKGlobalConfiguration
System.setProperty(SDKGlobalConfiguration.AWS_CBOR_DISABLE_SYSTEM_PROPERTY, 
"true")
{code}

> Downgrade kinesis protocol from CBOR to JSON not possible as required by 
> kinesalite
> ---
>
> Key: FLINK-9544
> URL: https://issues.apache.org/jira/browse/FLINK-9544
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.5.0, 1.4.1, 1.4.2
>Reporter: Ph.Duveau
>Priority: Critical
>
> The amazon client do not downgrade from CBOR to JSON while setting env 
> AWS_CBOR_DISABLE to true (or 1) and/or defining 
> com.amazonaws.sdk.disableCbor=true via JVM options. This bug is due to maven 
> shade relocation of com.amazon.* classes. As soon as you cancel this 
> relocation (by removing the relocation in the kinesis connector or by 
> re-relocating in the final jar), it reruns again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-09-02 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601172#comment-16601172
 ] 

eugen yushin edited comment on FLINK-10050 at 9/2/18 10:28 AM:
---

[~aljoscha], [~till.rohrmann]
 Guys, please can you take a look at PR?
 I didn't add unit tests because of:
 a. there're no mock tests for referenced files in master branch to cover such 
kind of delegates as evictor/trigger/...
 b. 'allowedLateness' is a feature of WindowedStream, and proposed fix simply 
delegates all the work to WindowedStream logic

Regards


was (Author: eyushin):
[~aljoscha], [~till.rohrmann]
Guys, please can you take a look at PR?
I didn't add unit tests because of:
a. there're no mock tests for referenced files in master branch to cover such 
kind of delegates as evictor/trigger/...
b. 'allowedLateness' is an feature of WindowedStream, and proposed fix simply 
delegates all the work to WindowedStream logic

Regards

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: pull-request-available, ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-09-02 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601172#comment-16601172
 ] 

eugen yushin commented on FLINK-10050:
--

[~aljoscha], [~till.rohrmann]
Guys, please can you take a look at PR?
I didn't add unit tests because of:
a. there're no mock tests for referenced files in master branch to cover such 
kind of delegates as evictor/trigger/...
b. 'allowedLateness' is an feature of WindowedStream, and proposed fix simply 
delegates all the work to WindowedStream logic

Regards

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: pull-request-available, ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-30 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597537#comment-16597537
 ] 

eugen yushin commented on FLINK-10050:
--

thx, I'm proceeding with PR then
will keep you posted

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-24 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591511#comment-16591511
 ] 

eugen yushin edited comment on FLINK-10050 at 8/24/18 11:31 AM:


[~aljoscha] There's no info about windows for any of consecutive operator is 
retained in Flink. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results]
{code:java}
 The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
{code}
At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
{code:java}
 Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
{code}
Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.


was (Author: eyushin):
[~aljoscha] There's no info about windows for any consecutive of operator in 
Flink. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results]
{code:java}
 The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
{code}
At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
{code:java}
 Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
{code}
Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-24 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591511#comment-16591511
 ] 

eugen yushin edited comment on FLINK-10050 at 8/24/18 11:30 AM:


[~aljoscha] There's no info about windows for any consecutive of operator in 
Flink. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results]
{code:java}
 The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
{code}
At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
{code:java}
 Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
{code}
Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.


was (Author: eyushin):
[~aljoscha] There's no info about windows for any of operator in Flink. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results]
{code}
 The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
{code}

At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
{code}
 Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
{code}

Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-24 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591511#comment-16591511
 ] 

eugen yushin edited comment on FLINK-10050 at 8/24/18 11:29 AM:


[~aljoscha] There's no info about windows for any of operator in Flink. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results]
{code}
 The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
{code}

At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
{code}
 Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
{code}

Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.


was (Author: eyushin):
[~aljoscha] There's no info about windows for any of operator in Flink. Docs:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results
```
The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
```

At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
```
Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
```

Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-24 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591511#comment-16591511
 ] 

eugen yushin commented on FLINK-10050:
--

[~aljoscha] There's no info about windows for any of operator in Flink. Docs:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#working-with-window-results
```
The result of a windowed operation is again a {{DataStream}}, no information 
about the windowed operations is retained in the result elements
```

At the same time, coGroup/join keeps element's timestamps and consecutive 
operators can assign elements to respective windows. Docs:
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#window-join]
```
Those elements that do get joined will have as their timestamp the largest 
timestamp that still lies in the respective window. For example a window with 
{{[5, 10)}} as its boundaries would result in the joined elements having 9 as 
their timestamp.
```

Business case: 2 streams, 1 for different business metrics, another one - 
similar metrics but from microservices logs, result - reconciliation of these 2 
streams. No other operators except sink are need for this particular business 
case.

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-13 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10050:
-
Affects Version/s: 1.6.0

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1, 1.6.0
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-03 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10050:
-
Labels: ready-to-commit windows  (was: )

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1
>Reporter: eugen yushin
>Priority: Major
>  Labels: ready-to-commit, windows
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-03 Thread eugen yushin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated FLINK-10050:
-
Affects Version/s: 1.5.1

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1
>Reporter: eugen yushin
>Priority: Major
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-03 Thread eugen yushin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568318#comment-16568318
 ] 

eugen yushin commented on FLINK-10050:
--

I've a bit experimented with `allowedLateness` and cogroups and have an 
implementation. Let's discuss if anyone has concerns on this so I can proceed 
with PR.

> Support 'allowedLateness' in CoGroupedStreams
> -
>
> Key: FLINK-10050
> URL: https://issues.apache.org/jira/browse/FLINK-10050
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.1
>Reporter: eugen yushin
>Priority: Major
>
> WindowedStream has a support of 'allowedLateness' feature, while 
> CoGroupedStreams are not. At the mean time, WindowedStream is an inner part 
> of CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
> simply delegated to WindowedStream.
> There's no chance to operate with late arriving data from previous steps in 
> cogroups (and joins). Consider the following flow:
> a. read data from source1 -> aggregate data with allowed lateness
> b. read data from source2 -> aggregate data with allowed lateness
> c. cogroup/join streams a and b, and compare aggregated values
> Step c doesn't accept any late data from steps a/b due to lack of 
> `allowedLateness` API call in CoGroupedStreams.java.
> Scope: add method `WithWindow.allowedLateness` to Java API 
> (flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10050) Support 'allowedLateness' in CoGroupedStreams

2018-08-03 Thread eugen yushin (JIRA)
eugen yushin created FLINK-10050:


 Summary: Support 'allowedLateness' in CoGroupedStreams
 Key: FLINK-10050
 URL: https://issues.apache.org/jira/browse/FLINK-10050
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: eugen yushin


WindowedStream has a support of 'allowedLateness' feature, while 
CoGroupedStreams are not. At the mean time, WindowedStream is an inner part of 
CoGroupedStreams and all main functionality (like evictor/trigger/...) is 
simply delegated to WindowedStream.
There's no chance to operate with late arriving data from previous steps in 
cogroups (and joins). Consider the following flow:
a. read data from source1 -> aggregate data with allowed lateness
b. read data from source2 -> aggregate data with allowed lateness
c. cogroup/join streams a and b, and compare aggregated values

Step c doesn't accept any late data from steps a/b due to lack of 
`allowedLateness` API call in CoGroupedStreams.java.

Scope: add method `WithWindow.allowedLateness` to Java API 
(flink-streaming-java) and extend scala API (flink-streaming-scala).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)