[jira] [Created] (FLINK-6210) RocksDB instance should be closed in ListViaMergeSpeedMiniBenchmark

2017-03-28 Thread Ted Yu (JIRA)
Ted Yu created FLINK-6210:
-

 Summary: RocksDB instance should be closed in 
ListViaMergeSpeedMiniBenchmark
 Key: FLINK-6210
 URL: https://issues.apache.org/jira/browse/FLINK-6210
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


rocksDB instance should be closed upon returning from main().

ListViaRangeSpeedMiniBenchmark has similar issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Flink on yarn passing yarn config params.

2017-03-28 Thread praveen kanamarlapudi
Hi,



How can I pass yarn application tag while running flink on yarn?



Example command: ./bin/flink run -d -m yarn-cluster -yn 4 -yjm 1024 -ytm
4096 examples/batch/WordCount.jar



Thanks,

Praveen


Re: [VOTE] Release Apache Flink 1.2.1 (RC1)

2017-03-28 Thread Haohui Mai
-1 (non-binding)

We recently found out that all jobs submitted via UI will have a
parallelism of 1, potentially due to FLINK-5808.

Filed FLINK-6209 to track it.

~Haohui

On Mon, Mar 27, 2017 at 2:59 AM Chesnay Schepler  wrote:

> If possible I would like to include FLINK-6183 & FLINK-6184 as well.
>
> They fix 2 metric-related issues that could arise when a Task is
> cancelled very early. (like, right away)
>
> FLINK-6183 fixes a memory leak where the TaskMetricGroup was never closed
> FLINK-6184 fixes a NullPointerExceptions in the buffer metrics
>
> PR here: https://github.com/apache/flink/pull/3611
>
> On 26.03.2017 12:35, Aljoscha Krettek wrote:
> > I opened a PR for FLINK-6188: https://github.com/apache/flink/pull/3616
> 
> >
> > This improves the previously very sparse test coverage for
> timestamp/watermark assigners and fixes the bug.
> >
> >> On 25 Mar 2017, at 10:22, Ufuk Celebi  wrote:
> >>
> >> I agree with Aljoscha.
> >>
> >> -1 because of FLINK-6188
> >>
> >>
> >> On Sat, Mar 25, 2017 at 9:38 AM, Aljoscha Krettek 
> wrote:
> >>> I filed this issue, which was observed by a user:
> https://issues.apache.org/jira/browse/FLINK-6188
> >>>
> >>> I think that’s blocking for 1.2.1.
> >>>
>  On 24 Mar 2017, at 18:57, Ufuk Celebi  wrote:
> 
>  RC1 doesn't contain Stefan's backport for the Asynchronous snapshots
>  for heap-based keyed state that has been merged. Should we create RC2
>  with that fix since the voting period only starts on Monday? I think
>  it would only mean rerunning the scripts on your side, right?
> 
>  – Ufuk
> 
> 
>  On Fri, Mar 24, 2017 at 3:05 PM, Robert Metzger 
> wrote:
> > Dear Flink community,
> >
> > Please vote on releasing the following candidate as Apache Flink
> version 1.2
> > .1.
> >
> > The commit to be voted on:
> > *732e55bd* (*
> http://git-wip-us.apache.org/repos/asf/flink/commit/732e55bd
> > *)
> >
> > Branch:
> > release-1.2.1-rc1
> >
> > The release artifacts to be voted on can be found at:
> > *http://people.apache.org/~rmetzger/flink-1.2.1-rc1/
> > *
> >
> > The release artifacts are signed with the key with fingerprint
> D9839159:
> > http://www.apache.org/dist/flink/KEYS
> >
> > The staging repository for this release can be found at:
> >
> https://repository.apache.org/content/repositories/orgapacheflink-1116
> >
> > -
> >
> >
> > The vote ends on Wednesday, March 29, 2017, 3pm CET.
> >
> >
> > [ ] +1 Release this package as Apache Flink 1.2.1
> > [ ] -1 Do not release this package, because ...
> >
>
>


[jira] [Created] (FLINK-6209) StreamPlanEnvironment always has a parallelism of 1

2017-03-28 Thread Haohui Mai (JIRA)
Haohui Mai created FLINK-6209:
-

 Summary: StreamPlanEnvironment always has a parallelism of 1
 Key: FLINK-6209
 URL: https://issues.apache.org/jira/browse/FLINK-6209
 Project: Flink
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


Thanks [~bill.liu8904] for triaging the issue.

After FLINK-5808 we saw that the Flink jobs that are uploaded through the UI 
always have a parallelism of 1, even the parallelism is explicitly set via in 
the UI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6208) Implement skip till next match strategy

2017-03-28 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-6208:
---

 Summary: Implement skip till next match strategy
 Key: FLINK-6208
 URL: https://issues.apache.org/jira/browse/FLINK-6208
 Project: Flink
  Issue Type: New Feature
  Components: CEP
Reporter: Dawid Wysakowicz
 Fix For: 1.3.0


Right now, we support two strategies (except for looping states):
* skip till any match -> {{followedBy}}
* strict contiguity -> {{next}}

We should also support a strategy that will match only the first occurence of a 
valid pattern.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6207) Duplicate type serializers for async snapshots of CopyOnWriteStateTable

2017-03-28 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-6207:
-

 Summary: Duplicate type serializers for async snapshots of 
CopyOnWriteStateTable
 Key: FLINK-6207
 URL: https://issues.apache.org/jira/browse/FLINK-6207
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Reporter: Stefan Richter
Assignee: Stefan Richter
 Fix For: 1.3.0


{{TypeSerializer}} are used for copy-on-write and the parallel snapshots in the 
{{CopyOnWriteStateTable}}. For stateful serializers, this can lead to race 
conditions. Snapshots need to duplicate the serializers before using them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6206) As an Engineer, I want task state transition log to be warn/error for FAILURE scenarios

2017-03-28 Thread Dan Bress (JIRA)
Dan Bress created FLINK-6206:


 Summary: As an Engineer, I want task state transition log to be 
warn/error for FAILURE scenarios
 Key: FLINK-6206
 URL: https://issues.apache.org/jira/browse/FLINK-6206
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0
Reporter: Dan Bress
Priority: Critical


If a task fails due to an exception, I would like that to be logged at a warn 
or an error level.  currently its info

{code}
private boolean transitionState(ExecutionState currentState, ExecutionState 
newState, Throwable cause) {
if (STATE_UPDATER.compareAndSet(this, currentState, newState)) {
if (cause == null) {
LOG.info("{} ({}) switched from {} to {}.", 
taskNameWithSubtask, executionId, currentState, newState);
} else {
LOG.info("{} ({}) switched from {} to {}.", 
taskNameWithSubtask, executionId, currentState, newState, cause);
}

return true;
} else {
return false;
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] Project build time and possible restructuring

2017-03-28 Thread Robert Metzger
I think your selection of modules is okay.
Moving out storm and the scala shell would be nice as well. But storm is
not really maintained, so maybe we should consider moving it out of the
Flink repo entirely.
And the scala shell is not a library, but it also doesn't really  belong
into the main repo.

Regarding the feature freeze: We either do it with a lot of  time in
advance to avoid any delays for the release, OR we do it right after the
release branch has been forked off.



On Tue, Mar 21, 2017 at 1:09 PM, Timo Walther  wrote:

> So what do we want to move to the libraries repository?
>
> I would propose to move these modules first:
>
> flink-cep-scala
> flink-cep
> flink-gelly-examples
> flink-gelly-scala
> flink-gelly
> flink-ml
>
> All other modules (e.g. in flink-contrib) are rather connectors. I think
> it would be better to move those in a connectors repository later.
>
> If we are not in a rush, we could do the moving after the feature-freeze.
> This is the time where most of the PR will have been merged.
>
> Timo
>
>
> Am 20/03/17 um 15:00 schrieb Greg Hogan:
>
>> We can add cluster tests using the distribution jar, and will need to do
>> so to remove Flink’s dependency on Hadoop. The YARN and Mesos tests would
>> still run nightly and running cluster tests should be much faster. As
>> troublesome as TravisCI has been, a major driver for this change has been
>> local build time.
>>
>> I agree with splitting off one repo at a time, but we’ll first need to
>> reorganize the core repo if using git submodules as flink-python and
>> flink-table would need to first be moved. So I think planning this out
>> first is a healthy idea, with the understanding that the plan will be
>> reevaluated.
>>
>> Any changes to the project structure need a scheduled period, perhaps a
>> week, for existing pull requests to be reviewed and accepted or closed and
>> later migrated.
>>
>>
>> On Mar 20, 2017, at 6:27 AM, Stephan Ewen  wrote:
>>>
>>> @Greg
>>>
>>> I am personally in favor of splitting "connectors" and "contrib" out as
>>> well. I know that @rmetzger has some reservations about the connectors,
>>> but
>>> we may be able to convince him.
>>>
>>> For the cluster tests (yarn / mesos) - in the past there were many cases
>>> where these tests caught cases that other tests did not, because they are
>>> the only tests that actually use the "flink-dist.jar" and thus discover
>>> many dependency and configuration issues. For that reason, my feeling
>>> would
>>> be that they are valuable in the core repository.
>>>
>>> I would actually suggest to do only the library split initially, to see
>>> what the challenges are in setting up the multi-repo build and release
>>> tooling. Once we gathered experience there, we can probably easily see
>>> what
>>> else we can split out.
>>>
>>> Stephan
>>>
>>>
>>> On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan  wrote:
>>>
>>> I’d like to use this refactoring opportunity to unspilt the Travis tests.
 With 51 builds queued up for the weekend (some of which may fail or have
 been force pushed) we are at the limit of the number of contributions we
 can process. Fixing this requires 1) splitting the project, 2)
 investigating speedups for long-running tests, and 3) staying cognizant
 of
 test performance when accepting new code.

 I’d like to add one to Stephan’s list of module group. I like that the
 modules are generic (“libraries”) so that no one module is alone and
 independent.

 Flink has three “libraries”: cep, ml, and gelly.

 “connectors” is a hotspot due to the long-running Kafka tests (and
 connectors for three Kafka versions).

 Both flink-storm and flink-python have a modest number of number of
 tests
 and could live with the miscellaneous modules in “contrib”.

 The YARN tests are long-running and problematic (I am unable to
 successfully run these locally). A “cluster” module could host
 flink-mesos,
 flink-yarn, and flink-yarn-tests.

 That gets us close to running all tests in a single Travis build.
   https://travis-ci.org/greghogan/flink/builds/212122590 <
 https://travis-ci.org/greghogan/flink/builds/212122590>

 I also tested (https://github.com/greghogan/flink/commits/core_build <
 https://github.com/greghogan/flink/commits/core_build>) with a maven
 parallelism of 2 and 4, with the latter a 6.4% drop in build time.
   https://travis-ci.org/greghogan/flink/builds/212137659 <
 https://travis-ci.org/greghogan/flink/builds/212137659>
   https://travis-ci.org/greghogan/flink/builds/212154470 <
 https://travis-ci.org/greghogan/flink/builds/212154470>

 We can run Travis CI builds nightly to guard against breaking changes.

 I also wanted to get an idea of how disruptive it would be to developers
 to divide the project into multiple git repos. I wrote a 

[jira] [Created] (FLINK-6205) Put late elements in side output.

2017-03-28 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-6205:
-

 Summary: Put late elements in side output.
 Key: FLINK-6205
 URL: https://issues.apache.org/jira/browse/FLINK-6205
 Project: Flink
  Issue Type: Bug
  Components: CEP
Affects Versions: 1.3.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
 Fix For: 1.3.0


Currently the CEP library had a somehow fuzzy way to handle late events. 
Essentially:
1) it accepts all events (late and early)
2) it sorts them based on event time
3) whenever a watermark arrives, it feeds them into the NFA.

This does not respecting event time.

In addition, given that the order in which elements are processed matters, this 
could lead to wrong results as events may be processed by the NFA out-of-order 
with respect to their timestamps.

This issue proposes to assume correctness of the watermark and consider as 
late, events that arrive having  a timestamp smaller than that of the last seen 
watermark. In addition, late events are not silently dropped, but the user can 
specify to send them to a side output, as done in the case of the 
{{WindowOperator}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Java library for Flink-Kudu integration

2017-03-28 Thread Jinkui Shi
This make sense. Only one problem, bahir-flink is not very active. Maybe need 
more committee to pay some time on it, and have some new plan.
We also plan to support carbondata[1] in bahir-flink.

By the way, to keep the code base thin, can rewrite flink with Scala step by 
step until Flink 2.0 :)))

[1] https://github.com/apache/incubator-carbondata 

> 在 2017年3月28日,下午5:46,Stephan Ewen  写道:
> 
> We are currently looking into how we can keep the size of the code base
> under control (because it is growing super large).
> 
> Part is moving libraries into a dedicated subrepository (there is a
> separate mailing list thread on that) and some connectors to Bahir.
> Connectors can move between Flink and Bahir, for example it makes sense to
> move heavily worked on connectors to the Flink code base.
> 
> On Tue, Mar 28, 2017 at 10:01 AM, Fabian Hueske  wrote:
> 
>> No, we do not want to move all connectors to Apache Bahir or replace the
>> connectors by Bahir.
>> 
>> The Flink community aims to maintain the most important connectors within
>> Flink.
>> Maintaining all connectors would be a huge effort. So, we decided to move
>> some of the less frequently used connectors to Bahir.
>> 
>> Best, Fabian
>> 
>> 
>> 
>> 2017-03-28 8:31 GMT+02:00 shijinkui :
>> 
>>> Hi, Fabian
>>> Do we have plan to replace Flink connectors with bahir-flink[1]?
>>> 
>>> [1] https://github.com/apache/Bahir-flink
>>> 
>>> 在 2017/3/28 上午12:15, "Fabian Hueske"  写入:
>>> 
 Hi Ruben,
 
 thanks for sharing this!
 A Flink Kudu connector is a great contribution and Bahir seems to be the
 right place for it.
 
 Thanks, Fabian
 
 
 2017-03-27 15:35 GMT+02:00 :
 
> Hi all,
> 
> I apologize for sending the email to both accounts, but not sure where
> this topic fits better.
> 
> In my team, we have been working in some PoCs and PoVs about new data
> architectures. As part of this work, we have implemented a library to
> connect Kudu and Flink. The library allows reading/writing from/to Kudu
> tablets using DataSet API and also writing to Kudu using DataStream
>>> API.
> 
> You can find the code and documentation (including some examples) in
> https://github.com/rubencasado/Flink-Kudu
> 
> Any comment/suggestion/contribution is very welcomed ☺
> 
> We will try to publish this contribution to the Apache Bahir project.
> 
> Best
> 
> 
> Rubén Casado Tejedor, PhD
>> accenture digital
> Big Data Manager
> ' + 34 629 009 429
> • ruben.casado.teje...@accenture.com casado.teje...@accenture.com>
> 
> 
> 
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have
> received it in error, please notify the sender immediately and delete
> the
> original. Any other use of the e-mail by you is prohibited. Where
> allowed
> by local law, electronic communications with Accenture and its
> affiliates,
> including e-mail and instant messaging (including content), may be
> scanned
> by our systems for the purposes of information security and assessment
> of
> internal compliance with Accenture policy.
> 
> __
> 
> www.accenture.com
> 
>>> 
>>> 
>> 



[jira] [Created] (FLINK-6204) Improve Event-Time OVER ROWS BETWEEN UNBOUNDED PRECEDING aggregation to SQL

2017-03-28 Thread sunjincheng (JIRA)
sunjincheng created FLINK-6204:
--

 Summary: Improve Event-Time OVER ROWS BETWEEN UNBOUNDED PRECEDING 
aggregation to SQL
 Key: FLINK-6204
 URL: https://issues.apache.org/jira/browse/FLINK-6204
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: sunjincheng
Assignee: sunjincheng


Currently `event time OVER ROWS BETWEEN UNBOUNDED PRECEDING aggregation to SQL` 
 implementation  class: ` UnboundedEventTimeOverProcessFunction` use data size 
uncontrollable memory data structures`sortedTimestamps: util.LinkedList [Long] 
cache data timestamps and sort timestamps. IMO,It's not a good way, because in 
the production environment there are millions of window data pre millisecond in 
our application scenario.So, I want to improve it. Welcome anyone to give me 
feedback.
What do you think? [~fhueske] and [~Yuhong_kyo]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Java library for Flink-Kudu integration

2017-03-28 Thread Stephan Ewen
We are currently looking into how we can keep the size of the code base
under control (because it is growing super large).

Part is moving libraries into a dedicated subrepository (there is a
separate mailing list thread on that) and some connectors to Bahir.
Connectors can move between Flink and Bahir, for example it makes sense to
move heavily worked on connectors to the Flink code base.

On Tue, Mar 28, 2017 at 10:01 AM, Fabian Hueske  wrote:

> No, we do not want to move all connectors to Apache Bahir or replace the
> connectors by Bahir.
>
> The Flink community aims to maintain the most important connectors within
> Flink.
> Maintaining all connectors would be a huge effort. So, we decided to move
> some of the less frequently used connectors to Bahir.
>
> Best, Fabian
>
>
>
> 2017-03-28 8:31 GMT+02:00 shijinkui :
>
>> Hi, Fabian
>> Do we have plan to replace Flink connectors with bahir-flink[1]?
>>
>> [1] https://github.com/apache/Bahir-flink
>>
>> 在 2017/3/28 上午12:15, "Fabian Hueske"  写入:
>>
>> >Hi Ruben,
>> >
>> >thanks for sharing this!
>> >A Flink Kudu connector is a great contribution and Bahir seems to be the
>> >right place for it.
>> >
>> >Thanks, Fabian
>> >
>> >
>> >2017-03-27 15:35 GMT+02:00 :
>> >
>> >> Hi all,
>> >>
>> >> I apologize for sending the email to both accounts, but not sure where
>> >> this topic fits better.
>> >>
>> >> In my team, we have been working in some PoCs and PoVs about new data
>> >> architectures. As part of this work, we have implemented a library to
>> >> connect Kudu and Flink. The library allows reading/writing from/to Kudu
>> >> tablets using DataSet API and also writing to Kudu using DataStream
>> API.
>> >>
>> >> You can find the code and documentation (including some examples) in
>> >> https://github.com/rubencasado/Flink-Kudu
>> >>
>> >> Any comment/suggestion/contribution is very welcomed ☺
>> >>
>> >> We will try to publish this contribution to the Apache Bahir project.
>> >>
>> >> Best
>> >>
>> >> 
>> >> Rubén Casado Tejedor, PhD
>> >> > accenture digital
>> >> Big Data Manager
>> >> ' + 34 629 009 429
>> >> • ruben.casado.teje...@accenture.com> >> casado.teje...@accenture.com>
>> >>
>> >> 
>> >>
>> >> This message is for the designated recipient only and may contain
>> >> privileged, proprietary, or otherwise confidential information. If you
>> >>have
>> >> received it in error, please notify the sender immediately and delete
>> >>the
>> >> original. Any other use of the e-mail by you is prohibited. Where
>> >>allowed
>> >> by local law, electronic communications with Accenture and its
>> >>affiliates,
>> >> including e-mail and instant messaging (including content), may be
>> >>scanned
>> >> by our systems for the purposes of information security and assessment
>> >>of
>> >> internal compliance with Accenture policy.
>> >> 
>> >> __
>> >>
>> >> www.accenture.com
>> >>
>>
>>
>


[jira] [Created] (FLINK-6203) DataSet Transformations

2017-03-28 Thread JIRA
苏拓 created FLINK-6203:
-

 Summary: DataSet Transformations
 Key: FLINK-6203
 URL: https://issues.apache.org/jira/browse/FLINK-6203
 Project: Flink
  Issue Type: Bug
  Components: DataSet API
Affects Versions: 1.2.0
Reporter: 苏拓
Priority: Minor
 Fix For: 1.2.0


the example of GroupReduce on sorted groups can't remove duplicate Strings in a 
DataSet.
need to add  "prev=t"
such as:
val output = input.groupBy(0).sortGroup(1, Order.ASCENDING).reduceGroup {
  (in, out: Collector[(Int, String)]) =>
var prev: (Int, String) = null
for (t <- in) {
  if (prev == null || prev != t)
out.collect(t)
prev=t
}
}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Figuring out when a job has successfully restored state

2017-03-28 Thread Gyula Fóra
Hi,

Another thought I had last night, maybe we could have another state for
recovering jobs in the future.
Deploying -> Recovering -> Running
This recovering state might only be applicable for state backends that have
to be restored before processing can start, lazy state backends (like
external databases) might go into processing state "directly".

What do you think? (I'm ccing dev)
Gyula

Gyula Fóra  ezt írta (időpont: 2017. márc. 27., H,
17:06):

> Hi all,
>
> I am trying to figure out the best way to tell when a job has successfully
> restored all state and started process.
>
> My first idea was to check the rest api and the number of processed bytes
> for each parallel operator and if thats greater than 0, it started.
> Unfortunately this logic fails if the operator doesnt receive any input for
> some time.
>
> Do we have any info like this exposed somewhere in a nicely queryable way?
>
> Thanks,
> Gyula
>


Re: Java library for Flink-Kudu integration

2017-03-28 Thread Fabian Hueske
No, we do not want to move all connectors to Apache Bahir or replace the
connectors by Bahir.

The Flink community aims to maintain the most important connectors within
Flink.
Maintaining all connectors would be a huge effort. So, we decided to move
some of the less frequently used connectors to Bahir.

Best, Fabian



2017-03-28 8:31 GMT+02:00 shijinkui :

> Hi, Fabian
> Do we have plan to replace Flink connectors with bahir-flink[1]?
>
> [1] https://github.com/apache/Bahir-flink
>
> 在 2017/3/28 上午12:15, "Fabian Hueske"  写入:
>
> >Hi Ruben,
> >
> >thanks for sharing this!
> >A Flink Kudu connector is a great contribution and Bahir seems to be the
> >right place for it.
> >
> >Thanks, Fabian
> >
> >
> >2017-03-27 15:35 GMT+02:00 :
> >
> >> Hi all,
> >>
> >> I apologize for sending the email to both accounts, but not sure where
> >> this topic fits better.
> >>
> >> In my team, we have been working in some PoCs and PoVs about new data
> >> architectures. As part of this work, we have implemented a library to
> >> connect Kudu and Flink. The library allows reading/writing from/to Kudu
> >> tablets using DataSet API and also writing to Kudu using DataStream API.
> >>
> >> You can find the code and documentation (including some examples) in
> >> https://github.com/rubencasado/Flink-Kudu
> >>
> >> Any comment/suggestion/contribution is very welcomed ☺
> >>
> >> We will try to publish this contribution to the Apache Bahir project.
> >>
> >> Best
> >>
> >> 
> >> Rubén Casado Tejedor, PhD
> >> > accenture digital
> >> Big Data Manager
> >> ' + 34 629 009 429
> >> • ruben.casado.teje...@accenture.com >> casado.teje...@accenture.com>
> >>
> >> 
> >>
> >> This message is for the designated recipient only and may contain
> >> privileged, proprietary, or otherwise confidential information. If you
> >>have
> >> received it in error, please notify the sender immediately and delete
> >>the
> >> original. Any other use of the e-mail by you is prohibited. Where
> >>allowed
> >> by local law, electronic communications with Accenture and its
> >>affiliates,
> >> including e-mail and instant messaging (including content), may be
> >>scanned
> >> by our systems for the purposes of information security and assessment
> >>of
> >> internal compliance with Accenture policy.
> >> 
> >> __
> >>
> >> www.accenture.com
> >>
>
>


Re: Java library for Flink-Kudu integration

2017-03-28 Thread shijinkui
Hi, Fabian
Do we have plan to replace Flink connectors with bahir-flink[1]?

[1] https://github.com/apache/Bahir-flink

在 2017/3/28 上午12:15, "Fabian Hueske"  写入:

>Hi Ruben,
>
>thanks for sharing this!
>A Flink Kudu connector is a great contribution and Bahir seems to be the
>right place for it.
>
>Thanks, Fabian
>
>
>2017-03-27 15:35 GMT+02:00 :
>
>> Hi all,
>>
>> I apologize for sending the email to both accounts, but not sure where
>> this topic fits better.
>>
>> In my team, we have been working in some PoCs and PoVs about new data
>> architectures. As part of this work, we have implemented a library to
>> connect Kudu and Flink. The library allows reading/writing from/to Kudu
>> tablets using DataSet API and also writing to Kudu using DataStream API.
>>
>> You can find the code and documentation (including some examples) in
>> https://github.com/rubencasado/Flink-Kudu
>>
>> Any comment/suggestion/contribution is very welcomed ☺
>>
>> We will try to publish this contribution to the Apache Bahir project.
>>
>> Best
>>
>> 
>> Rubén Casado Tejedor, PhD
>> > accenture digital
>> Big Data Manager
>> ' + 34 629 009 429
>> • ruben.casado.teje...@accenture.com> casado.teje...@accenture.com>
>>
>> 
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>>have
>> received it in error, please notify the sender immediately and delete
>>the
>> original. Any other use of the e-mail by you is prohibited. Where
>>allowed
>> by local law, electronic communications with Accenture and its
>>affiliates,
>> including e-mail and instant messaging (including content), may be
>>scanned
>> by our systems for the purposes of information security and assessment
>>of
>> internal compliance with Accenture policy.
>> 
>> __
>>
>> www.accenture.com
>>



[jira] [Created] (FLINK-6202) YarnSessionCli couldn't submit jobs when using -z option

2017-03-28 Thread Syinchwun Leo (JIRA)
Syinchwun Leo created FLINK-6202:


 Summary: YarnSessionCli couldn't submit jobs when using -z option
 Key: FLINK-6202
 URL: https://issues.apache.org/jira/browse/FLINK-6202
 Project: Flink
  Issue Type: Bug
  Components: Client
Affects Versions: 1.2.0
Reporter: Syinchwun Leo


(1)When using the following command:
bin/yarn-session.sh -n 3 -z YARN1
to point out zookeeper namespace and then using bin/flink run to submit a job 
like this: 
bin/flink run examples/streaming/WindowJoin.jar   

the job could not find the JobManager.  

when using "flink run" and detect there is a properites file existing, user 
gets a FlinkYarnSessionCli, and when "retrieveCluster()" using 
FlinkYarnSessionCi, zkNamespace is set value accroding to "yz" option or 
application ID. If user doesn't using -yz, zkNamespace is set to application 
ID, and there is no information in Zookeeper under the fold "application 
ID"(already set according to -z in yarn-session.sh). 
(2)But when using -yz, FlinkYarnSessionCli's isActive() will return false 
because of loadYarnPropertiesFile() method seems to only allow using DETACHED 
option. User will get a standlone client
(3)When using -z, FlinkYarnSessionCli will not recognize the option because 
that CliFrontend has already set FlinkYarnsessionCli's shortPrefix to "y" and 
only recognizes "-y*" options



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [jira] [Created] (FLINK-6167) Consider removing ArchivedExecutionGraph classes

2017-03-28 Thread David Wang
Hi Chesnay,

If Acrhived* versions of ExecutionGraph classes have been removed, is there
any way to retrieve data in ExecutionGraph from JobMaster via RPC? Some of
our projects rely on the Archived* classes from Job Masters running in the
cluster.

Thanks,
David

2017-03-22 22:11 GMT+08:00 Chesnay Schepler (JIRA) :

> Chesnay Schepler created FLINK-6167:
> ---
>
>  Summary: Consider removing ArchivedExecutionGraph classes
>  Key: FLINK-6167
>  URL: https://issues.apache.org/jira/browse/FLINK-6167
>  Project: Flink
>   Issue Type: Improvement
>   Components: JobManager
> Affects Versions: 1.3.0
> Reporter: Chesnay Schepler
> Priority: Minor
>
>
> The Archived* versions of the ExecutionGraph classes (ExecutionGraph,
> ExecutionJobVertex, ExecutionVertex, Execution) were originally intended to
> provide a serializable object that can be transferred to the History Server.
>
> The revised implementation of the history server however no longer
> requires them.
>
> As such we could either remove them, or keep them for testing purposes
> (instead of mocking) as they simplify the testing of the web-interface
> handlers quite a lot, which would however require keeping the Access*
> interfaces.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.15#6346)
>