Re: Flink with Yarn on MapR

2017-01-20 Thread Robert Metzger
Hi Aniket,

The first error you are reporting is not a big deal. I think there are some
whitespaces or so in the default flink config file that cause the parser to
print the message.

The second one is tougher to fix. It seems that there is an issue with
loading the Hadoop configuration correctly.
Can you post the contents of the client log file from the "log/" directory?
It contains for example the Hadoop version being used (Maybe it didn't
correctly pick up the custom Hadoop version) and maybe some helpful WARN
log messages (because our YARN client is doing some checks before starting)


Regards,
Robert



On Fri, Jan 20, 2017 at 12:27 AM, Aniket Deshpande 
wrote:

> Hi,
> I am trying to setup flink with Yarn on Mapr cluster. I built flink
> (flink-1.3-SNAPSHOT) as follows:
>
> mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.0-mapr-
> 1607
>
> The build is successful. Then I try to run ./bin/yarn-session.sh -n 4
> (without changing any config or whatsoever) and get the following two
> errors:
>
>
> 1.   This one is a minor error (or bug?)
>
> Error while trying to split key and value in configuration file
> /conf/flink-conf.yaml:
>
>
>
> 2.   Second error is more serious and as follows:
>
>
>
> Error while deploying YARN cluster: Couldn't deploy Yarn cluster
>
> java.lang.RuntimeException: Couldn't deploy Yarn cluster
>
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.
> deploy(AbstractYarnClusterDescriptor.java:425)
>
> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(
> FlinkYarnSessionCli.java:620)
>
> at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(
> FlinkYarnSessionCli.java:476)
>
> at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(
> FlinkYarnSessionCli.java:473)
>
> at org.apache.flink.runtime.security.
> HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1595)
>
> at org.apache.flink.runtime.security.
> HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
>
> at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(
> FlinkYarnSessionCli.java:473)
>
> Caused by: java.lang.NumberFormatException: For input string:
> "${nodemanager.resource.cpu-vcores}"
>
> at java.lang.NumberFormatException.forInputString(
> NumberFormatException.java:65)
>
> at java.lang.Integer.parseInt(Integer.java:569)
>
> at java.lang.Integer.parseInt(Integer.java:615)
>
> at org.apache.hadoop.conf.Configuration.getInt(
> Configuration.java:1271)
>
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.
> isReadyForDeployment(AbstractYarnClusterDescriptor.java:315)
>
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.
> deployInternal(AbstractYarnClusterDescriptor.java:434)
>
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.
> deploy(AbstractYarnClusterDescriptor.java:423)
>
> ... 9 more
>
>
>
> Now, the property that is causing this error nodemanager.resource.cpu-vcores
> is appropriately set in yarn-site.xml. The cluster is 3 ResourceManager (2
> on standby) and 5 NodeManager. To be extra safe, I changed the value for
> this property at ALL the Nodemanager’s yarn-site.xml.
>
> I believe that this property is default set to 4 according to this blog [
> https://www.mapr.com/blog/best-practices-yarn-resource-management ]. So I
> am trying to understand as to why is this error cropping up.
>
> The required environment variable is set as follows:
>
> YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/
>
>
>
> I also tried setting the fs.hdfs.hadoopconf property (to point to the
> Hadoop conf directory) in flink-config.yaml. But I still get the same error.
>
>
>
> Any help with these (especially the latter) errors would be greatly
> appreciated.
>
>
> Thanks in advance,
>
> Aniket D
>


[jira] [Created] (FLINK-5585) NullPointer Exception in JobManager.updateAccumulators

2017-01-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5585:
---

 Summary: NullPointer Exception in JobManager.updateAccumulators
 Key: FLINK-5585
 URL: https://issues.apache.org/jira/browse/FLINK-5585
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 1.1.4, 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.2.0, 1.3.0, 1.1.5


{code}
NullPointerException

at JobManager$$updateAccumulators$1.apply(JobManager.scala:1790)
at JobManager$$updateAccumulators$1.apply(JobManager.scala:1788)
at scala.collection.mutable.ResizableArray$class.forEach(ArrayBuffer.scala:48)
at scala.collection.mutable.ArrayBuffer.forEach(ArrayBuffer.scala:48)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$updateAccumulators(JobManager.scala:1788)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:967)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.LeaderSessionMassageFilter$$anonfun$receive$1.applyOrEslse(LeaderSessionMessageFilter.scala:44)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMesages$$anon$1.applyOrElse(LogMessages.scala:28)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: States split over to external storage

2017-01-20 Thread Stephan Ewen
Hi!

This is an interesting suggestion.
Just to make sure I understand it correctly: Do you design this for cases
where the state per machine is larger than that machines memory/disk? And
in that case, you cannot solve the problem by scaling out (having more
machines)?

Stephan


On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin  wrote:

> Hi there,
>
> I would like to discuss split over local states to external storage. The
> use case is NOT another external state backend like HDFS, rather just to
> expand beyond what local disk/ memory can hold when large key space exceeds
> what task managers could handle. Realizing FLINK-4266 might be hard to
> tacking all-in-one, I would live give a shot to split-over first.
>
> An intuitive approach would be treat HeapStatebackend as LRU cache and
> split over to external key/value storage when threshold triggered. To make
> this happen, we need minor refactor to runtime and adding set/get logic.
> One nice thing of keeping HDFS to store snapshots would be avoid versioning
> conflicts. Once checkpoint restore happens, partial write data will be
> overwritten with previously checkpointed value.
>
> Comments?
>
> --
> -Chen Qin
>


Re: States split over to external storage

2017-01-20 Thread Fabian Hueske
If I got it correctly, part of the motivation is to move rarely used / cold
state to an external storage (please correct me if I'm wrong).

2017-01-20 11:35 GMT+01:00 Stephan Ewen :

> Hi!
>
> This is an interesting suggestion.
> Just to make sure I understand it correctly: Do you design this for cases
> where the state per machine is larger than that machines memory/disk? And
> in that case, you cannot solve the problem by scaling out (having more
> machines)?
>
> Stephan
>
>
> On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin  wrote:
>
> > Hi there,
> >
> > I would like to discuss split over local states to external storage. The
> > use case is NOT another external state backend like HDFS, rather just to
> > expand beyond what local disk/ memory can hold when large key space
> exceeds
> > what task managers could handle. Realizing FLINK-4266 might be hard to
> > tacking all-in-one, I would live give a shot to split-over first.
> >
> > An intuitive approach would be treat HeapStatebackend as LRU cache and
> > split over to external key/value storage when threshold triggered. To
> make
> > this happen, we need minor refactor to runtime and adding set/get logic.
> > One nice thing of keeping HDFS to store snapshots would be avoid
> versioning
> > conflicts. Once checkpoint restore happens, partial write data will be
> > overwritten with previously checkpointed value.
> >
> > Comments?
> >
> > --
> > -Chen Qin
> >
>


Re: [DISCUSS] Proposed updates to Flink project site

2017-01-20 Thread Mike Winters
Yes, thanks to all for helping! There are still some comments / ideas from
this thread (such as a navigation with sub-menus, a cleaner nav menu on the
mobile site, reworking some of the information-dense pages) that I think
should be addressed but haven't been yet, and so I'll be making updates on
an ongoing basis.

On Wed, Jan 18, 2017 at 3:59 PM, Fabian Hueske  wrote:

> Thanks everybody for working on the website!
>
> 2017-01-18 15:24 GMT+01:00 Robert Metzger :
>
> > Cool, thank you for merging!
> >
> > Once we've got enough feedback here that its working, I'll also tweet
> about
> > it from @ApacheFlink.
> >
> > On Wed, Jan 18, 2017 at 3:17 PM, Ufuk Celebi  wrote:
> >
> > > The updated page has been merged: http://flink.apache.org/
> > >
> > > Would appreciate it if you took a minute to just browse the web page
> > > and look out for any left over errors.
> > >
> > > On Tue, Jan 10, 2017 at 2:35 PM, Till Rohrmann 
> > > wrote:
> > > > Great work Mike :-) I like the new web page a lot.
> > > >
> > > > Some comments:
> > > >
> > > > - Can we put a higher resolved Flink logo picture. Looks not good on
> my
> > > > monitor and mobile
> > > > - When browsing the web page on my phone, I noticed that all nav bar
> > > items
> > > > start at the very left border. Maybe some kind of padding would be
> good
> > > > there. (mobile using chrome browser)
> > > > - When browsing to a page with anchors (e.g. Introduction to Flink),
> we
> > > > show all the different anchor links at the very beginning of this
> > page. I
> > > > think it could be helpful to also show these links in the nav bar if
> > this
> > > > page is active
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 10, 2017 at 2:22 PM, Kostas Tzoumas  >
> > > wrote:
> > > >
> > > >> Love it! I would merge this asap
> > > >>
> > > >> Some comments (not blockers)
> > > >>
> > > >> - I was looking for the downloads page for a minute, until I saw
> that
> > it
> > > >> is linked via the big blue button :) Did anyone have the same
> problem?
> > > >> - Do we still link to the wiki?
> > > >> - Maybe too much info is buried under the Community & Project Info
> > > >> category?
> > > >>
> > > >> On Tue, Jan 10, 2017 at 1:27 PM, Ufuk Celebi 
> wrote:
> > > >>
> > > >>> Thanks for addressing my comments. I really like the new web site!
> > > >>>
> > > >>> If there are no objections, I would like to merge this later today.
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 10, 2017 at 11:56 AM, Mike Winters 
> > > wrote:
> > > >>> > Thanks for the comments, Ufuk.
> > > >>> >
> > > >>> > -I rearranged the front page so that the blog posts come before
> > > 'Powered
> > > >>> > By' logos--I think this makes sense.
> > > >>> > -Nav bar highlights issue is fixed
> > > >>> >
> > > >>> > Just pushed these changes, and the preview site (
> > > >>> > https://wints.github.io/flink-web/) has been updated as well.
> > > >>> >
> > > >>> > Best,
> > > >>> > Mike
> > > >>> >
> > > >>> > On Mon, Jan 9, 2017 at 6:12 PM, Ufuk Celebi 
> > wrote:
> > > >>> >
> > > >>> >> Looks great!
> > > >>> >>
> > > >>> >> Some minor comments:
> > > >>> >>
> > > >>> >> - Font page: Maybe put the latest blog posts before the power by
> > > logos?
> > > >>> >> - Nav bar on the left: I noticed that for some links the active
> > > color
> > > >>> >> does not show. Clicking on "Blog" makes highlights the link but
> > not
> > > >>> >> for others like "Powered By"
> > > >>> >>
> > > >>> >> On Mon, Jan 9, 2017 at 5:08 PM, Mike Winters 
> > > wrote:
> > > >>> >> > Hi, you can also now preview the site here:
> > > >>> >> > https://wints.github.io/flink-web/.
> > > >>> >> >
> > > >>> >> > -Mike
> > > >>> >> >
> > > >>> >> > On Mon, Jan 9, 2017 at 3:07 PM, Mike Winters <
> mwi...@gmail.com>
> > > >>> wrote:
> > > >>> >> >
> > > >>> >> >> Hi everyone,
> > > >>> >> >>
> > > >>> >> >> For the sake of having many sets of eyes to help catch
> > potential
> > > >>> >> issues, I
> > > >>> >> >> decided to wait until after the new year to create a PR for
> the
> > > >>> updated
> > > >>> >> >> Flink site.
> > > >>> >> >>
> > > >>> >> >> You can find it here: https://github.com/apache/
> > > flink-web/pull/44
> > > >>> >> >>
> > > >>> >> >> Please share any feedback!
> > > >>> >> >>
> > > >>> >> >> Thanks,
> > > >>> >> >> Mike
> > > >>> >> >>
> > > >>> >> >> On Fri, Dec 2, 2016 at 6:39 PM, Maximilian Michels <
> > > m...@apache.org>
> > > >>> >> wrote:
> > > >>> >> >>
> > > >>> >> >>> The changes look great, Mike! Here's a quick screenshot of
> the
> > > >>> front
> > > >>> >> page:
> > > >>> >> >>> http://pasteboard.co/images/51Tqpm8Ke.png
> > > >>> >> >>>
> > > >>> >> >>>
> > > >>> >> >>>
> > > >>> >> >>>
> > > >>> >> >>>
> > > >>> >> >>> A quick one liner to paste in your terminal to open a
> browser
> > > >>> window
> > > >>> >> with
> > > >>> >> >>> the changes:
> > > >>> >> >>>
> > > >>> >> >>> git clone https://github.com/wints/flink-web-updates
> > > web-updates
> > > >>> && cd
> > > >>> >> >>> web-update

Re: States split over to external storage

2017-01-20 Thread liuxinchun
I think the current backup strategy checkpoints the while whindow 
everytime,when the window size is very large, it's time and storage consuming. 
An increamental policy should be consided.

Sent from HUAWEI AnyOffice
发件人:Stephan Ewen
收件人:dev@flink.apache.org,
抄送:iuxinc...@huawei.com,Aljoscha Krettek,时金魁,
时间:2017-01-20 18:35:46
主题:Re: States split over to external storage

Hi!

This is an interesting suggestion.
Just to make sure I understand it correctly: Do you design this for cases
where the state per machine is larger than that machines memory/disk? And
in that case, you cannot solve the problem by scaling out (having more
machines)?

Stephan


On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin  wrote:

> Hi there,
>
> I would like to discuss split over local states to external storage. The
> use case is NOT another external state backend like HDFS, rather just to
> expand beyond what local disk/ memory can hold when large key space exceeds
> what task managers could handle. Realizing FLINK-4266 might be hard to
> tacking all-in-one, I would live give a shot to split-over first.
>
> An intuitive approach would be treat HeapStatebackend as LRU cache and
> split over to external key/value storage when threshold triggered. To make
> this happen, we need minor refactor to runtime and adding set/get logic.
> One nice thing of keeping HDFS to store snapshots would be avoid versioning
> conflicts. Once checkpoint restore happens, partial write data will be
> overwritten with previously checkpointed value.
>
> Comments?
>
> --
> -Chen Qin
>


答复: States split over to external storage

2017-01-20 Thread liuxinchun
What's more I make a little change in WindowOperator for ListState in

https://issues.apache.org/jira/browse/FLINK-5572


发件人:Stephan Ewen
收件人:dev
抄送:iuxinc...@huawei.com,Aljoscha Krettek,时金魁
时间:2017-01-20 18:35:46
主题:Re: States split over to external storage

Hi!

This is an interesting suggestion.
Just to make sure I understand it correctly: Do you design this for cases
where the state per machine is larger than that machines memory/disk? And
in that case, you cannot solve the problem by scaling out (having more
machines)?

Stephan


On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin  wrote:

> Hi there,
>
> I would like to discuss split over local states to external storage. The
> use case is NOT another external state backend like HDFS, rather just to
> expand beyond what local disk/ memory can hold when large key space exceeds
> what task managers could handle. Realizing FLINK-4266 might be hard to
> tacking all-in-one, I would live give a shot to split-over first.
>
> An intuitive approach would be treat HeapStatebackend as LRU cache and
> split over to external key/value storage when threshold triggered. To make
> this happen, we need minor refactor to runtime and adding set/get logic.
> One nice thing of keeping HDFS to store snapshots would be avoid versioning
> conflicts. Once checkpoint restore happens, partial write data will be
> overwritten with previously checkpointed value.
>
> Comments?
>
> --
> -Chen Qin
>


Re: States split over to external storage

2017-01-20 Thread Stephan Ewen
There are works on different approaches of incremental policies underways
(more soon in some design proposals),
but the point raised here sounded different to me.

Maybe Chen Qin can describe in some more detail what he was having in
mind...

On Fri, Jan 20, 2017 at 12:15 PM, liuxinchun  wrote:

> I think the current backup strategy checkpoints the while whindow
> everytime,when the window size is very large, it's time and storage
> consuming. An increamental policy should be consided.
>
> Sent from HUAWEI AnyOffice
> *发件人:*Stephan Ewen
> *收件人:*dev@flink.apache.org,
> *抄送:*iuxinc...@huawei.com,Aljoscha Krettek,时金魁,
> *时间:*2017-01-20 18:35:46
> *主题:*Re: States split over to external storage
>
> Hi!
>
> This is an interesting suggestion.
> Just to make sure I understand it correctly: Do you design this for cases
> where the state per machine is larger than that machines memory/disk? And
> in that case, you cannot solve the problem by scaling out (having more
> machines)?
>
> Stephan
>
>
> On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin  wrote:
>
> > Hi there,
> >
> > I would like to discuss split over local states to external storage. The
> > use case is NOT another external state backend like HDFS, rather just to
> > expand beyond what local disk/ memory can hold when large key space
> exceeds
> > what task managers could handle. Realizing FLINK-4266 might be hard to
> > tacking all-in-one, I would live give a shot to split-over first.
> >
> > An intuitive approach would be treat HeapStatebackend as LRU cache and
> > split over to external key/value storage when threshold triggered. To
> make
> > this happen, we need minor refactor to runtime and adding set/get logic.
> > One nice thing of keeping HDFS to store snapshots would be avoid
> versioning
> > conflicts. Once checkpoint restore happens, partial write data will be
> > overwritten with previously checkpointed value.
> >
> > Comments?
> >
> > --
> > -Chen Qin
> >
>


[jira] [Created] (FLINK-5586) Extend TableProgramsTestBase for object reuse modes

2017-01-20 Thread Timo Walther (JIRA)
Timo Walther created FLINK-5586:
---

 Summary: Extend TableProgramsTestBase for object reuse modes
 Key: FLINK-5586
 URL: https://issues.apache.org/jira/browse/FLINK-5586
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Timo Walther


We should also test if all runtime operators of the Table API work correctly if 
object reuse mode is set to true. This should be done for all cluster-based 
ITCases, not the collection-based ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5587) AsyncWaitOperatorTest timed out on Travis

2017-01-20 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5587:
--

 Summary: AsyncWaitOperatorTest timed out on Travis
 Key: FLINK-5587
 URL: https://issues.apache.org/jira/browse/FLINK-5587
 Project: Flink
  Issue Type: Test
Reporter: Ufuk Celebi


The Maven watch dog script cancelled the build and printed a stack trace for 
{{AsyncWaitOperatorTest.testOperatorChainWithProcessingTime(AsyncWaitOperatorTest.java:379)}}.

https://s3.amazonaws.com/archive.travis-ci.org/jobs/192441719/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5589) Tumbling group windows on batch tables do not consider object reuse

2017-01-20 Thread Timo Walther (JIRA)
Timo Walther created FLINK-5589:
---

 Summary: Tumbling group windows on batch tables do not consider 
object reuse
 Key: FLINK-5589
 URL: https://issues.apache.org/jira/browse/FLINK-5589
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Timo Walther


The tumbling group windows on batch tables might not work properly when object 
reuse mode is enabled.

See: 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html#object-reuse-enabled

"Do not remember input objects received from an Iterable."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5588) Add a unit scaler based on different norms

2017-01-20 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created FLINK-5588:
--

 Summary: Add a unit scaler based on different norms
 Key: FLINK-5588
 URL: https://issues.apache.org/jira/browse/FLINK-5588
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Stavros Kontopoulos
Priority: Minor


So far ML has two scalers: min-max and the standard.
A third one used is the scaler to unit.
We could implement a transformer for this type of scaling for different norms 
available to the user.

Resources
[1] https://en.wikipedia.org/wiki/Feature_scaling



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5590) Create a proper internal state hierarchy

2017-01-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5590:
---

 Summary: Create a proper internal state hierarchy
 Key: FLINK-5590
 URL: https://issues.apache.org/jira/browse/FLINK-5590
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.3.0


Currently, the state interfaces (like {{ListState}}, {{ValueState}}, 
{{ReducingState}}) are very sparse and contain only methods exposed to the 
users. That makes sense to keep the public stable API minimal

At the same time, the runtime needs more methods for its internal interaction 
with state, such as:
  - setting namespaces
  - accessing raw values
  - merging namespaces

These are currently realized by re-creating or re-obtaining the state objects 
from the KeyedStateBackend. That method causes quite an overhead for each 
access to the state

The KeyedStateBackend tries to do some tricks to reduce that overhead, but does 
it only partially and induces other overhead in the course.

The root cause of all these issues is a problem in the design: There is no 
proper "internal state abstraction" in a similar way as there is an external 
state abstraction (the public state API).

We should add a similar hierarchy of states for the internal methods. It would 
look like in the example below:

{code}
 * State
 *   |
 *   +---InternalKvState
 *   | |
 *  MergingState   |
 *   | |
 *   +-InternalMergingState
 *   | |
 *  ++--+  |
 *  |   |  |
 * ReducingStateListState+-+-+
 *  |   ||   |
 *  +---+   +---   -InternalListState
 *  ||
 *  +-InternalReducingState
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5591) queryable state: request failures failing on the server side do not contain client stack trace

2017-01-20 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5591:
--

 Summary: queryable state: request failures failing on the server 
side do not contain client stack trace
 Key: FLINK-5591
 URL: https://issues.apache.org/jira/browse/FLINK-5591
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Failures during queries using QueryableStateClient throw exceptions like 
UnknownKeyOrNamespace but these are thrown on the server side with its 
stacktrace included, then serialised and sent over to the client where they are 
deserialised and re-thrown.
For debugging it would help a lot if these "server-exceptions" would be 
encapsulated in new client-exception instances so that we also get the client's 
stack trace. Especially since failing requests may be caused by the client 
itself, e.g. non-existing jobs/keys, wrong namespace/serialisers, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Proposed updates to Flink project site

2017-01-20 Thread Ufuk Celebi
If you like you could also open JIRA issues for these points with a note about 
the discussion here. That way we make sure that we don't forget about our 
feedback.

– Ufuk

On 20 January 2017 at 12:02:14, Mike Winters (mwi...@gmail.com) wrote:
> > Yes, thanks to all for helping! There are still some comments  
> / ideas from
> this thread (such as a navigation with sub-menus, a cleaner nav  
> menu on the
> mobile site, reworking some of the information-dense pages)  
> that I think
> should be addressed but haven't been yet, and so I'll be making  
> updates on
> an ongoing basis.



[jira] [Created] (FLINK-5592) Wrong number of RowSerializers with nested Rows in Collection mode

2017-01-20 Thread Anton Solovev (JIRA)
Anton Solovev created FLINK-5592:


 Summary: Wrong number of RowSerializers with nested Rows in 
Collection mode
 Key: FLINK-5592
 URL: https://issues.apache.org/jira/browse/FLINK-5592
 Project: Flink
  Issue Type: Bug
Reporter: Anton Solovev
Assignee: Anton Solovev


{code}
@Test
  def testNestedRowTypes(): Unit = {
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env, config)

tEnv.registerTableSource("rows", new MockSource)

val table: Table = tEnv.scan("rows")
val nestedTable: Table = tEnv.scan("rows").select('person)

table.printSchema()
nestedTable.printSchema()
val collect: Seq[Row] = nestedTable.collect()
print(collect)
  }

  class MockSource extends BatchTableSource[Row] {
import org.apache.flink.api.java.ExecutionEnvironment
import org.apache.flink.api.java.DataSet

override def getDataSet(execEnv: ExecutionEnvironment): DataSet[Row] = {
  val data = List(
Row.of(Row.of("data_1", "dob"), Row.of("info_4", "dub")),
Row.of(Row.of("data_1", "dob"), Row.of("info_4", "dub")),
Row.of(Row.of("data_1", "dob"), Row.of("info_4", "dub")))
  execEnv.fromCollection(data.asJava, getReturnType)
}

override def getReturnType: TypeInformation[Row] = {
  new RowTypeInfo(
Array[TypeInformation[_]](
  new RowTypeInfo(
Array[TypeInformation[_]](BasicTypeInfo.STRING_TYPE_INFO, 
BasicTypeInfo.STRING_TYPE_INFO),
Array("name", "age"))),
Array("person")
  )
}
  }
{code}

throws {{java.lang.RuntimeException: Row arity of from does not match 
serializers}}

stacktrace 
{code}
at 
org.apache.flink.api.java.typeutils.runtime.RowSerializer.copy(RowSerializer.java:82)
at 
org.apache.flink.api.java.typeutils.runtime.RowSerializer.copy(RowSerializer.java:36)
at 
org.apache.flink.api.common.operators.GenericDataSourceBase.executeOnCollections(GenericDataSourceBase.java:234)
at 
org.apache.flink.api.common.operators.CollectionExecutor.executeDataSource(CollectionExecutor.java:218)
at 
org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:154)
at 
org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:130)
at 
org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:181)
at 
org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:157)
at 
org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:130)
at 
org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:114)
at 
org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:35)
at 
org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:47)
at 
org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:42)
at 
org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:672)
at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] (Not) tagging reviewers

2017-01-20 Thread Stephan Ewen
@Alexey - Pull Requests backlog is going pretty crazy, I agree.

That is not because the committers are not working on pull requests, there
is simply so many of them.
We are looking for new committers (and discussing in the PMC).

Tagging is not going to make this better, I believe. It may make it worse,
because it discourages non-tagged community members from picking up a pull
request.



On Mon, Jan 16, 2017 at 4:54 PM, Anton Solovev 
wrote:

> Hi, Alexey
>
> I will check abandoned PRs to reduce obviously outdated ones and add them
> to a cleanup list https://issues.apache.org/jira/browse/FLINK-5384
>
>
> -Original Message-
> From: Alexey Demin [mailto:diomi...@gmail.com]
> Sent: Monday, January 16, 2017 5:05 PM
> To: dev@flink.apache.org
> Subject: Re: [DISCUSS] (Not) tagging reviewers
>
> Hi all
>
> View from my prospective:
> in middle of summer - 150 PR
> in middle of autumn - 180
> now 206.
>
> This is mix of bugfixes and improvements.
> I understand that work on new features important, but when small and
> trivial fixes stay in states of PR more then 2-3 month, then all users
> think about changing engine on other product.
>
> Only way push people to merge this fixes in master it's tags.
>
> I don't speak about big changes, only about small and trivial with review
> less then 5 min.
>
> Features important, but if this features work incorrect, then user can
> select more stability product without any hesitation.
>
> Thanks
> Alexey Diomin
>
>
>
> 2017-01-16 16:36 GMT+04:00 Ufuk Celebi :
>
> > On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote:
> > > > Though, when someone has started reviewing a PR and shows interest
> > > it probably makes sense to finish doing so. Wouldn’t tagging be
> > > acceptable there?
> > > In those case tagging triggers direct notifications, so that people
> > > already involved in a conversation get reminded and answer pending
> > > questions.
> >
> > I think that's totally fine Paris since it is more of a reminder in
> > that case.
> >
> > Stephan is referring to PRs that have a last line in the description
> > like "@XZY for review please".
> >
> > – Ufuk
> >
> >
> >
>


Unregistering Managed State in Operator Backend

2017-01-20 Thread Paris Carbone
Hi folks,

I have a little question regarding the managed store operator backend, in case 
someone can help. 

Is there some convenient way (planned or under development) to completely 
unregister a state entry (e.g. a ListState) with a given id from the backend?
It is fairly easy to register new states dynamically (i.e. with 
getOperatorState(…)), why not being able to discard it as well?

I would find this feature extremely convenient to a fault tolerance related PR 
I am working on but I can think of many use cases that might need it.


Paris



Re: [DISCUSS] (Not) tagging reviewers

2017-01-20 Thread Haohui Mai
Hi,

Usually this problem can be well addressed by growing the community. We
Just wondering whether there is a wiki to describe how non-committers can
help on reviewing the patches?

I'm glad to help out on reviewing patches.

Regards,
Haohui

On Fri, Jan 20, 2017 at 7:42 AM Stephan Ewen  wrote:

> @Alexey - Pull Requests backlog is going pretty crazy, I agree.
>
> That is not because the committers are not working on pull requests, there
> is simply so many of them.
> We are looking for new committers (and discussing in the PMC).
>
> Tagging is not going to make this better, I believe. It may make it worse,
> because it discourages non-tagged community members from picking up a pull
> request.
>
>
>
> On Mon, Jan 16, 2017 at 4:54 PM, Anton Solovev 
> wrote:
>
> > Hi, Alexey
> >
> > I will check abandoned PRs to reduce obviously outdated ones and add them
> > to a cleanup list https://issues.apache.org/jira/browse/FLINK-5384
> >
> >
> > -Original Message-
> > From: Alexey Demin [mailto:diomi...@gmail.com]
> > Sent: Monday, January 16, 2017 5:05 PM
> > To: dev@flink.apache.org
> > Subject: Re: [DISCUSS] (Not) tagging reviewers
> >
> > Hi all
> >
> > View from my prospective:
> > in middle of summer - 150 PR
> > in middle of autumn - 180
> > now 206.
> >
> > This is mix of bugfixes and improvements.
> > I understand that work on new features important, but when small and
> > trivial fixes stay in states of PR more then 2-3 month, then all users
> > think about changing engine on other product.
> >
> > Only way push people to merge this fixes in master it's tags.
> >
> > I don't speak about big changes, only about small and trivial with review
> > less then 5 min.
> >
> > Features important, but if this features work incorrect, then user can
> > select more stability product without any hesitation.
> >
> > Thanks
> > Alexey Diomin
> >
> >
> >
> > 2017-01-16 16:36 GMT+04:00 Ufuk Celebi :
> >
> > > On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote:
> > > > > Though, when someone has started reviewing a PR and shows interest
> > > > it probably makes sense to finish doing so. Wouldn’t tagging be
> > > > acceptable there?
> > > > In those case tagging triggers direct notifications, so that people
> > > > already involved in a conversation get reminded and answer pending
> > > > questions.
> > >
> > > I think that's totally fine Paris since it is more of a reminder in
> > > that case.
> > >
> > > Stephan is referring to PRs that have a last line in the description
> > > like "@XZY for review please".
> > >
> > > – Ufuk
> > >
> > >
> > >
> >
>


[jira] [Created] (FLINK-5593) Modify current dcos-flink implementation to use runit service

2017-01-20 Thread Vijay Srinivasaraghavan (JIRA)
Vijay Srinivasaraghavan created FLINK-5593:
--

 Summary: Modify current dcos-flink implementation to use runit 
service
 Key: FLINK-5593
 URL: https://issues.apache.org/jira/browse/FLINK-5593
 Project: Flink
  Issue Type: Sub-task
Reporter: Vijay Srinivasaraghavan
Assignee: Vijay Srinivasaraghavan


The current Flink DCOS integration provides basic functionality to run Flink on 
DCOS using Marathon (https://github.com/mesosphere/dcos-flink-service)  on top 
of which support for additional security and HDFS configurations needs to be 
build. We would like to follow the integration aspect similar to how Spark is 
handled (https://github.com/mesosphere/spark-build/tree/master/docker). The 
purpose of this JIRA is to port the current changes to use runit service and 
provide a structure to support including additonal configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5594) Use higher-resolution graphic on Flink project site homepage

2017-01-20 Thread Mike Winters (JIRA)
Mike Winters created FLINK-5594:
---

 Summary: Use higher-resolution graphic on Flink project site 
homepage
 Key: FLINK-5594
 URL: https://issues.apache.org/jira/browse/FLINK-5594
 Project: Flink
  Issue Type: Improvement
Reporter: Mike Winters
Priority: Minor


On some monitors, the graphic on the Flink project site homepage 
(http://flink.apache.org/index.html) is blurry. It should be replaced with a 
higher-resolution image. The current image's filename is 
"flink-front-graphic-update.png".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5595) Add links to sub-sections in the left-hand navigation bar

2017-01-20 Thread Mike Winters (JIRA)
Mike Winters created FLINK-5595:
---

 Summary: Add links to sub-sections in the left-hand navigation bar
 Key: FLINK-5595
 URL: https://issues.apache.org/jira/browse/FLINK-5595
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Mike Winters
Priority: Minor


Some pages on the Flink project site (such as 
http://flink.apache.org/introduction.html) include a table of contents at the 
top. The sections from the ToC are not exposed in the left-hand nav when the 
page is active, but this could be a useful addition, especially for longer, 
content-heavy pages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5596) Add padding to nav menu on mobile site

2017-01-20 Thread Mike Winters (JIRA)
Mike Winters created FLINK-5596:
---

 Summary: Add padding to nav menu on mobile site
 Key: FLINK-5596
 URL: https://issues.apache.org/jira/browse/FLINK-5596
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Mike Winters
Priority: Minor


Improve the appearance of the nav menu on the Flink mobile site by adding 
padding on left side of menu options. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5597) Improve the LocalClusteringCoefficient documentation

2017-01-20 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-5597:


 Summary: Improve the LocalClusteringCoefficient documentation
 Key: FLINK-5597
 URL: https://issues.apache.org/jira/browse/FLINK-5597
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Reporter: Vasia Kalavri


The LocalClusteringCoefficient usage section should explain what is the 
algorithm output and how to retrieve the actual local clustering coefficient 
scores from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Need help on understanding flink runtime and window function

2017-01-20 Thread Fritz Budiyanto
Hi Flink Dev,

I’m new to Flink and have a few questions below:

1. I’m trying to understand Flink runtime on the server side, and couldn’t 
figure out where the code which execute the window function sum below. I wanted 
to put a break point but got lost in the code base. Could someone shed a light 
? 
val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
  .map { (_, 1) }
  .keyBy(0)
  .window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
  .sum(1)
2. How is the Scala jar file get executed on the server side ? Is there 
internal documentation explaining the process ?

3. I’m planning to use ContinuousProcessingTimeTrigger on a session window. Is 
there possibility in the window function to figure out if the window is about 
to be retired ? For instance, for the recurring trigger I’m planning to do some 
processing. When the window is about to be retired, I’d like to do a different 
processing (ie. Computing final value and flush). Any suggestion ?
 
—
Fritz

Re: [DISCUSS] (Not) tagging reviewers

2017-01-20 Thread jincheng sun
I totally agree with all of your ideas.










Best wishes,



SunJincheng.

Stephan Ewen 于2017年1月16日 周一19:42写道:

> Hi!
>
>
>
> I have seen that recently many pull requests designate reviews by writing
>
> "@personA review please" or so.
>
>
>
> I am personally quite strongly against that, I think it hurts the community
>
> work:
>
>
>
>   - The same few people get usually "designated" and will typically get
>
> overloaded and often not do the review.
>
>
>
>   - At the same time, this discourages other community members from looking
>
> at the pull request, which is totally undesirable.
>
>
>
>   - In general, review participation should be "pull based" (person decides
>
> what they want to work on) not "push based" (random person pushes work to
>
> another person). Push-based just creates the wrong feeling in a community
>
> of volunteers.
>
>
>
>   - In many cases the designated reviews are not the ones most
>
> knowledgeable in the code, which is understandable, because how should
>
> contributors know whom to tag?
>
>
>
>
>
> Long story short, why don't we just drop that habit?
>
>
>
>
>
> Greetings,
>
> Stephan
>
>


Re: [DISCUSS] Improvements to flink's cost based optimizer

2017-01-20 Thread Fabian Hueske
Hi Kurt,

thanks for breaking down the overall into smaller tasks and creating the
corresponding JIRA issues.

Using default estimates for unknown tables can be quite risky, especially
for statistics like cardinality.
In this cases collecting basic stats while writing the input (i.e., a
arbitrary DataSet) to some storage and reading the data back might be a
viable default. A frequently requested feature for the DataSet API is to
cache a DataSet in memory (with spilling to local disk). This might help
here as well.
We can also offer a hook to inject base statistics such as cardinality by
the user.

Regarding the cost model and parallelism. Right now, we mainly optimize to
reduce data volume (network + disk IO). Optimizing for execution time
(which is required when choosing the parallelism) is harder because you
need to combine network and disk IO, CPU, and parallelism into a cost
formula. How much these factors contribute to execution time is very
specific to the hardware / cluster you are running on. If we want to go
into this direction, we might need to have a method to benchmark the system
and calibrate the cost model.

Best,
Fabian


2017-01-19 4:47 GMT+01:00 Kurt Young :

> Hi Fabian,
>
> Thanks for your detailed response and sorry for the late response. Your
> opinions all make sense to me, and here is some thoughts to your open
> questions:
>
> - Regarding to table without sufficient statistics, especially these kind
> of "dynamic" table which derived from some arbitrary DataSet whose
> statistics cannot be analyzed beforehand, i think in first version we can
> just provide some fake and fixed statistics to let the process work.
> Another approach is we can save the DataSet as some intermediate result
> table and do the statistics analyze before further operations. In the
> future, a more advanced and ideal way is we keep collecting statistics when
> we running the job and we can have a way to dynamic modify the plan during
> job executions.
>
> - Regrading to parallelism control, i think it's a good use case of
> statistics. Once we have a good cost estimation and how user expects the
> performance of the job, we can definitely do some auto tuning for them.
>
> I have opened a jira to track the status and detailed implementation steps
> for this issue: https://issues.apache.org/jira/browse/FLINK-5565. Whoever
> interests with this topic can continue the discussion there, either in
> parent jira or sub-tasks.
>
> Best,
> Kurt
>
> On Wed, Jan 11, 2017 at 5:56 AM, Fabian Hueske  wrote:
>
> > Hi Kurt,
> >
> > thanks for starting this discussion!
> > Although, we use Calcite's cost based optimizer we do not use its full
> > potential. As you correctly identified, this is mainly due to the lack of
> > reliable statistics.
> > Moreover, we use Calcite only for logical optimization, i.e., the
> optimizer
> > basically rewrites the query and pushed down filters and projections (no
> > join reodering yet).
> > For batch queries, the logically optimized plan is translated into a
> > DataSet program and the DataSet optimizer chooses the physical execution
> > plan (shipping strategies, hash vs. merge join, etc.).
> > It would be great if we could improve this situation for batch tables
> > (stats on streaming tables might change while a query is executed) by
> doing
> > the complete optimization (logical & physical) in Calcite.
> > However, this will be a long way.
> >
> > I agree with your first steps to designing a catalog / stats store
> > component that can store and provide table and column statistics.
> > Once we have stats about tables, we can start to improve the optimization
> > step-by-step.
> > The first thing should be to improve the logical optimization and enable
> > join reordering.
> > Once we have that, we can start to chose execution plans for operators by
> > using the optimizer hints of the DataSet optimizer. This will also
> involve
> > tracking the physical properties of intermediate results (sorting,
> > partitioning, etc.) in Calcite.
> >
> > I would also recommend to keep the cost model as simple as possible.
> > A detailed cost model is hard to reason about and does not really help if
> > its parameters are imprecise.
> > There are just too many numbers to get wrong like input cardinalities,
> > selectivities, or cost ratio of disk to net IO.
> >
> > A few open questions remain:
> > - How do we handle cases where there is not sufficient statistics for all
> > tables? For example if we have a query on a Table which was derived from
> at
> > DataSet (no stats) which is joined with some external tables with stats.
> > - Should we control the parallelism of operators based on cardinality
> > information?
> >
> >
> > Best, Fabian
> >
> > 2017-01-10 15:22 GMT+01:00 Kurt Young :
> >
> > > Hi,
> > >
> > > Currently flink already uses cost-based optimizer,  but due to the
> reason
> > > we didn’t have accurate statistics and the simple cost model, we
> actually
> > > don't gain much from this f

Re: [DISCUSS] proposal for the User Defined AGGregate (UDAGG)

2017-01-20 Thread Fabian Hueske
Hi Shaoxuan,

thanks a lot for this great design doc.
I think user defined aggregation functions are a very important feature for
the Table API and SQL.

Have you thought about how the aggregation functions will be embedded in
Flink functions?
At the moment, we have a generic Flink function which is configured with
aggregation functions, i.e., we do not leverage code generation here.
Do you plan to embed built-in and user-defined aggregations functions that
implement the proposed API with code generation?

Can you maybe extend the JIRA or design document with this information?

Thank you,
Fabian

2017-01-18 20:55 GMT+01:00 Shaoxuan Wang :

> Hi everyone,
> I have drafted the design doc (link is provided below) for UDAGG, and
> created the JIRA (FLINK-5564) to track the progress of this design.
> Special thanks to Stephan and Fabian for their advice and help.
>
> Please check the design doc, feel free to share your comments in the google
> doc:
> https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz6
> 7yXOypY7Uh5gIOK2r-U/edit
>
> Regards,
> Shaoxuan
>
> On Wed, Jan 11, 2017 at 6:09 AM, Fabian Hueske  wrote:
>
> > Hi Shaoxuan,
> >
> > user-defined aggregates would be a great addition to the Table API / SQL.
> > I completely agree that the current (internal) interface is not well
> suited
> > as an external interface and needs to be redesigned if exposed to users.
> >
> > We need to careful think about this new interface and how we can
> integrate
> > it with the DataStream (and DataSet) API to support all required
> > operations, esp. with respect to null aggregates and support for
> combining
> > / merging.
> > I agree that for efficient execution, we should avoid WindowFunctions
> > (large state) and FoldFunction (not mergeable). If we need a new
> interface
> > in the DataStream API, we need to discuss this in more detail.
> > I think we need a bit more information about the proposed UDAGG interface
> > to discuss how this can be mapped to DataStream operators.
> >
> > Support for retraction will be required for our future plans with the
> > streaming Table API / SQL interface.
> >
> > Looking forward to your proposal,
> > Fabian
> >
> > 2017-01-10 15:40 GMT+01:00 Shaoxuan Wang :
> >
> > > Hello everyone,
> > >
> > > I am writing this email to propose a new User Defined Aggregate
> > interface.
> > > We were trying to leverage the existing Aggregate interface, but
> > > unfortunately we realized that it is not sufficient to meet all our
> > needs.
> > > Here are the obstacles we have observed:
> > > 1) The current aggregate interface is not very concise to users. One
> > needs
> > > to know the design details of the intermediate Row buffer before
> > implements
> > > an Aggregate. Seven functions are needed even for a simple Count
> > aggregate.
> > > We'd better to make the UDAGG interface much more concisely.
> > > 2) the current aggregate function can be only applied on one single
> > column.
> > > There are many scenarios which require the aggregate function taking
> > > multiple columns as the inputs.
> > > 3) “Retraction” is not covered in the current Aggregate.
> > >
> > > For #1, I am thinking instead of letting users to manipulate the
> > > intermediate buffer, we could potentially put the entire Aggregate
> > instance
> > > or a subclass instance of Aggregate to the Row buffer, such that the
> user
> > > does not need to know how the Aggregate state is maintained by the
> > > framework.
> > > But to achieve this goal, we probably need a new dataStream API. The
> > > existing reduce API does not work with two different types of inputs
> (in
> > > this proposal, the inputs will be upstream values, and the instance of
> > the
> > > current accumulated Aggregate), while the fold API is not able to merge
> > the
> > > two Aggregate results (which is usually needed for merging two session
> > > windows).
> > >
> > > For #3, besides the aggregate itself, there are a few other things need
> > to
> > > be taken care of to fully support the retractions. I will share a
> > separate
> > > concrete proposal about how to generate and process retractions, and
> how
> > it
> > > works along with this new proposed UDAGG.
> > >
> > > I would like really appreciate if you can share your opinions on this
> > > proposal, especially for the needed dataStream API for #1. Also, if
> there
> > > is any other good things you think to be better added for UDAGG, please
> > > feel free to share with us. I will draft my proposal in a google doc
> and
> > > share to the flink DEV group very soon.
> > >
> > > Thanks,
> > > Shaoxuan
> > >
> >
>


[jira] [Created] (FLINK-5598) Return jar name when jar is uploaded

2017-01-20 Thread Sendoh (JIRA)
Sendoh created FLINK-5598:
-

 Summary: Return jar name when jar is uploaded
 Key: FLINK-5598
 URL: https://issues.apache.org/jira/browse/FLINK-5598
 Project: Flink
  Issue Type: Improvement
  Components: Web Client
Reporter: Sendoh


As as a developer I want jar file name is returned when jar is uploaded
Currently it returns nothing, as the code shown:
File newFile = new File(jarDir, UUID.randomUUID() + "_" + filename);
if (tempFile.renameTo(newFile)) {
// all went well
return "{}";
}
Ref: 
https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarUploadHandler.java#L58

My proposal will be 

reuturn {"fileName": newFile.getName()}

Any suggestion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] (Not) tagging reviewers

2017-01-20 Thread Fabian Hueske
Hi Haohui,

reviewing pull requests is a great way of contributing to the community!

I am not aware of specific instructions for the review process. The are
some dos and don'ts on our "contribute code" page [1] that should be
considered. Apart from that, I think the best way to start is to become
familiar with a certain part of the code base (reading code, contributing)
and then to look out for pull requests that address the part you are
familiar with.

The review does not have to cover all aspects of a PR (a committer will
have a look as well), but from my personal experience the effort to review
a PR is often much lower if some other person has had a look at it already
and gave feedback.
I think this can help a lot to reduce the review "load" on the committers.
Maybe you find some contributors who are interested in the same components
as you and you can start reviewing each others code.

Thanks,
Fabian

[1] http://flink.apache.org/contribute-code.html#coding-guidelines


2017-01-20 23:02 GMT+01:00 jincheng sun :

> I totally agree with all of your ideas.
>
>
>
>
>
>
>
>
>
>
> Best wishes,
>
>
>
> SunJincheng.
>
> Stephan Ewen 于2017年1月16日 周一19:42写道:
>
> > Hi!
> >
> >
> >
> > I have seen that recently many pull requests designate reviews by writing
> >
> > "@personA review please" or so.
> >
> >
> >
> > I am personally quite strongly against that, I think it hurts the
> community
> >
> > work:
> >
> >
> >
> >   - The same few people get usually "designated" and will typically get
> >
> > overloaded and often not do the review.
> >
> >
> >
> >   - At the same time, this discourages other community members from
> looking
> >
> > at the pull request, which is totally undesirable.
> >
> >
> >
> >   - In general, review participation should be "pull based" (person
> decides
> >
> > what they want to work on) not "push based" (random person pushes work to
> >
> > another person). Push-based just creates the wrong feeling in a community
> >
> > of volunteers.
> >
> >
> >
> >   - In many cases the designated reviews are not the ones most
> >
> > knowledgeable in the code, which is understandable, because how should
> >
> > contributors know whom to tag?
> >
> >
> >
> >
> >
> > Long story short, why don't we just drop that habit?
> >
> >
> >
> >
> >
> > Greetings,
> >
> > Stephan
> >
> >
>


Re: Taking time off

2017-01-20 Thread Fabian Hueske
Hi Max,

Thanks for all your efforts!
Hope to see you back soon.

Take care, Fabian


2017-01-16 11:22 GMT+01:00 Vasiliki Kalavri :

> Hi Max,
>
> thank you for all your work! Enjoy your time off and hope to have you back
> with us soon ^^
>
> Cheers,
> -Vasia.
>
> On 14 January 2017 at 09:03, Maximilian Michels  wrote:
>
> > Dear Squirrels,
> >
> > Thank you! It's been very exciting to see the Flink community grow and
> > flourish over the past two years.
> >
> > For the beginning of this year, I decided to take some time off, which
> > means I'll be less engaged on the mailing list or on GitHub/JIRA.
> >
> > In the meantime, if you have any questions I might be able to answer,
> feel
> > free to contact me. Looking forward to see the squirrels rise further!
> >
> > Best,
> > Max
> >
>