Discuss about [FLINK-10134] UTF-16 support for TextInputFormat bug fixed

2018-09-27 Thread x1q1j1
dear everyone,

I hope to discuss this jira with everyone in order to handle this 
matter better. Here are some of my thoughts:


1.Where should the BOM be read? I think that when the file is started 
at the beginning of the file, you still need to increase the logic for 
processing the bom. Add a variable to the read bom encoding logic to record the 
file bom encoding. For example: put it in the function createinputsplit.
2.We can use the previously generated variables to determine whether it 
is (bom with UTF8, UTF16 with bom, UTF32 with bom), and control the byte size 
according to the encoding type to handle the end of each line, because I found 
that the previous bug is actually A coding problem, and the improper handling 
of each line of records ends up. In response to this problem, I did the 
following work:



String utf8 = "UTF-8";


String utf16 = "UTF-16";


String utf32 = "UTF-32";


int stepSize = 0;


String charsetName = this.getCharsetName();


if (charsetName.contains(utf8)) {


stepSize = 1;


} else if (charsetName.contains(utf16)) {


stepSize = 2;


} else if (charsetName.contains(utf32)) {


stepSize = 4;


}


//Check if \n is used as delimiter and the end of this line is a \r, then 
remove \r from the line


if (this.getDelimiter() != null && this.getDelimiter().length == 1


&& this.getDelimiter()[0] == NEW_LINE && offset + numBytes >= stepSize


&& bytes[offset + numBytes - stepSize] == CARRIAGE_RETURN) {


numBytes -= stepSize;


}


numBytes = numBytes - stepSize + 1;


return new String(bytes, offset, numBytes, this.getCharsetName());







   If you still don't know what I want to describe, you can see the detailed 
code implementation in the PR I submitted. 
Here is the link to PR:  https://github.com/apache/flink/pull/6710  
Here is the link to Jira: https://issues.apache.org/jira/browse/FLINK-10134



Looking forward to your reply



Best wishes.
qianjinxu

[jira] [Created] (FLINK-10456) Remove org.apache.flink.api.common.time.Deadline

2018-09-27 Thread tison (JIRA)
tison created FLINK-10456:
-

 Summary: Remove org.apache.flink.api.common.time.Deadline
 Key: FLINK-10456
 URL: https://issues.apache.org/jira/browse/FLINK-10456
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.7.0
Reporter: tison
Assignee: tison
 Fix For: 1.7.0


We already have {{scala.concurrent.duration.Deadline}}.

{{org.apache.flink.api.common.time.Deadline}} is not a rich extend of it. I 
suspect at which situation we need a customized Deadline. If not, introduce a 
weak alternation seems unreasonable and raise confusion.

What do you think? cc [~StephanEwen] [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10455) Potential Kafka producer leak in case of failures

2018-09-27 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-10455:
---

 Summary: Potential Kafka producer leak in case of failures
 Key: FLINK-10455
 URL: https://issues.apache.org/jira/browse/FLINK-10455
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: 1.5.2
Reporter: Nico Kruber


If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
may get an {{ProducerFencedException}}. Documentation around 
{{ProducerFencedException}} explicitly states that we should close the producer 
after encountering it.

By looking at the code, it doesn't seem like this is actually done in 
{{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
{{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
exception, we don't clean up (nor try to commit) any other transaction.
-> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
simply iterates over the {{pendingCommitTransactions}} which is not touched 
during {{close()}}

Now if we restart the failing job on the same Flink cluster, any resources from 
the previous attempt will still linger around.


[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Handling burst I/O when using tumbling/sliding windows

2018-09-27 Thread Rong Rong
HI Piotrek,

Yes, to be more clear,
1) the network I/O issue I am referring to is in between Flink and external
sink. We did not see issues in between operators.
2) yes we've considered rate limiting sink functions as well which is also
mentioned in the doc. along with some of the the pro-con we identified.

This kind of problem seems to only occur in WindowOperator so far, but yes
it can probably occur to any aligned interval based operator.

--
Rong

On Wed, Sep 26, 2018 at 11:44 PM Piotr Nowojski 
wrote:

> Hi,
>
> Thanks for the proposal. Could you provide more
> background/explanation/motivation why do you need such feature? What do you
> mean by “network I/O” degradation?
>
> On it’s own burst writes shouldn’t cause problems within Flink. If they
> do, we might want to fix the original underlying problem and if they are
> causing problems in external systems, we also might think about other
> approaches to fix/handle the problem (write rate limiting?), which might be
> more general and not fixing only bursts originating from WindowOperator.
> I’m not saying that your proposal is bad or anything, but I would just like
> to have more context :)
>
> Piotrek.
>
> > On 26 Sep 2018, at 19:21, Rong Rong  wrote:
> >
> > Hi Dev,
> >
> > I was wondering if there's any previous discussion regarding how to
> handle
> > burst network I/O when deploying Flink applications with window
> operators.
> >
> > We've recently see some significant network I/O degradation when trying
> to
> > use sliding window to perform rolling aggregations. The pattern is very
> > periodic: output connections get no traffic for a period of time until a
> > burst at window boundaries (in our case every 5 minutes).
> >
> > We have drafted a doc
> > <
> https://docs.google.com/document/d/1fEhbcRgxxX8zFYD_iMBG1DCbHmTcTRfRQFXelPhMFiY/edit?usp=sharing
> >
> > on
> > how we proposed to handle it to smooth the output traffic spikes. Please
> > kindly take a look, any comments and suggestions are highly appreciated.
> >
> > --
> > Rong
>
>


[jira] [Created] (FLINK-10454) Travis fails on ScheduleOrUpdateConsumersTest

2018-09-27 Thread tison (JIRA)
tison created FLINK-10454:
-

 Summary: Travis fails on ScheduleOrUpdateConsumersTest
 Key: FLINK-10454
 URL: https://issues.apache.org/jira/browse/FLINK-10454
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.7.0
Reporter: tison
 Fix For: 1.7.0


Can even be reproduced locally. Maybe a duplicate but as a reminder.

{code:java}
org.apache.flink.runtime.jobmanager.scheduler.ScheduleOrUpdateConsumersTest 
Time elapsed: 4.514 sec <<< ERROR! java.net.BindException: Address already in 
use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) 
at sun.nio.ch.Net.bind(Net.java:425) at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
 at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
 at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10453) Move hadoop 2.4 travis profile into cron job

2018-09-27 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10453:


 Summary: Move hadoop 2.4 travis profile into cron job
 Key: FLINK-10453
 URL: https://issues.apache.org/jira/browse/FLINK-10453
 Project: Flink
  Issue Type: Sub-task
  Components: Travis
Affects Versions: 1.7.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.7.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10452) Expose Additional Metrics to Reason about Statesize

2018-09-27 Thread Konstantin Knauf (JIRA)
Konstantin Knauf created FLINK-10452:


 Summary: Expose Additional Metrics to Reason about Statesize
 Key: FLINK-10452
 URL: https://issues.apache.org/jira/browse/FLINK-10452
 Project: Flink
  Issue Type: Improvement
  Components: Metrics
Reporter: Konstantin Knauf


For monitoring purposes it would be helpful, if Flink could expose metrics 
about the number of keys/windows for each registered keyed state. 

Open Questions:
* One Metric per Registered State? One Metric per KeyedOperator?
* Performance Impact (should this be default behavior?)
* Possible to know the number of windows during runtime?
* RocksDB only gives you an estimate of the number keys. Would be nice if we 
could derive the exact number inside Flink. This would also help in sizing the 
RocksDB instances and estimated their memory footprint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10451) TableFunctionCollector should handle the life cycle of ScalarFunction

2018-09-27 Thread Ruidong Li (JIRA)
Ruidong Li created FLINK-10451:
--

 Summary: TableFunctionCollector should handle the life cycle of 
ScalarFunction
 Key: FLINK-10451
 URL: https://issues.apache.org/jira/browse/FLINK-10451
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Ruidong Li
Assignee: Ruidong Li


Considering the following query:

table.join(udtf('a)).where(udf('b))

the filter will be pushed into DataSetCorrelate/DataStreamCorrelate without 
triggering open() and close()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10450) Broken links in the documentation

2018-09-27 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10450:


 Summary: Broken links in the documentation
 Key: FLINK-10450
 URL: https://issues.apache.org/jira/browse/FLINK-10450
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Project Website
Affects Versions: 1.7.0
Reporter: Chesnay Schepler
 Fix For: 1.7.0


{code}
[2018-09-27 09:57:51] ERROR `/flinkdev/building.html' not found.
[2018-09-27 09:57:51] ERROR `/dev/stream/dataset_transformations.html' not 
found.
[2018-09-27 09:57:51] ERROR `/dev/stream/windows.html' not found.
---
Found 3 broken links.
Search for page containing broken link using 'grep -R BROKEN_PATH DOCS_DIR'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10449) Docker flink hostname is locked to "cluster"

2018-09-27 Thread Mike Pedersen (JIRA)
Mike Pedersen created FLINK-10449:
-

 Summary: Docker flink hostname is locked to "cluster"
 Key: FLINK-10449
 URL: https://issues.apache.org/jira/browse/FLINK-10449
 Project: Flink
  Issue Type: Bug
  Components: Docker
Affects Versions: 1.5.4
Reporter: Mike Pedersen


After FLINK-8696, the second parameter of  jobmanager.sh is interpreted as the 
hostname. This ends up overriding the hostname in the dockerfile, making the 
jobmanager reject all taskmanager messages as non-local recipients.

See also 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-5-4-issues-w-TaskManager-connecting-to-ResourceManager-td23298.html#a23310|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Codespeed deployment for Flink

2018-09-27 Thread Till Rohrmann
Great addition Piotr and Nico. This is a really nice tool for the community
to monitor performance regressions in Flink.

On Tue, Sep 25, 2018 at 8:41 PM Peter Huang 
wrote:

> It is a great tool. Thanks for the contribution.
>
> On Tue, Sep 25, 2018 at 11:39 AM Jin Sun  wrote:
>
> > Great tool!
> >
> > > On Sep 24, 2018, at 10:59 PM, Zhijiang(wangzhijiang999) <
> > wangzhijiang...@aliyun.com.INVALID> wrote:
> > >
> > > Thanks @Piotr Nowojski  and @Nico Kruber for the good job!
> > >
> > > I already benefit from this benchmark in the previous PRs. Wish the
> > visualization tool becoming stronger to benefit more for the community!
> > >
> > > Best,
> > > Zhijiang
> > > --
> > > 发件人:Piotr Nowojski 
> > > 发送时间:2018年9月21日(星期五) 22:59
> > > 收件人:dev 
> > > 抄 送:Nico Kruber 
> > > 主 题:Codespeed deployment for Flink
> > >
> > > Hello community,
> > >
> > > For almost a year in data Artisans Nico and I were maintaining a setup
> > > that continuously evaluates Flink with benchmarks defined at
> > > https://github.com/dataArtisans/flink-benchmarks <
> > https://github.com/dataArtisans/flink-benchmarks>. With growing interest
> > > and after proving useful a couple of times, we have finally decided to
> > > publish the web UI layer of this setup. Currently it is accessible via
> > > the following (maybe not so?) temporarily url:
> > >
> > > http://codespeed.dak8s.net:8000 
> > >
> > > This is a simple web UI to present performance changes over past and
> > > present commits to Apache Flink. It only has a couple of views and the
> > > most useful ones are:
> > >
> > > 1. Timeline
> > > 2. Comparison (I recommend to use normalization)
> > >
> > > Timeline is useful for spotting unintended regressions or unexpected
> > > improvements. It is being updated every six hours.
> > > Comparison is useful for comparing a given branch (for example a
> pending
> > > PR) with the master branch. More about that later.
> > >
> > > The codespeed project on it’s own is just a presentation layer. As
> > > mentioned before, the only currently available benchmarks are defined
> in
> > > the flink-benchmarks repository and they are executed periodically or
> on
> > > demand by Jenkins on a single bare metal machine. The current setup
> > > limits us only to micro benchmarks (they are easier to
> > > setup/develop/maintain and have a quicker feedback loop compared to
> > > cluster benchmarks) but there is no reason preventing us from setting
> up
> > > other kinds of benchmarks and upload their results to our codespeed
> > > instance as well.
> > >
> > > Regarding the comparison view. Currently data Artisans’ Flink mirror
> > > repository at https://github.com/dataArtisans/flink <
> > https://github.com/dataArtisans/flink> is configured to
> > > trigger benchmark runs on every commit/change that happens on the
> > > benchmark-request branch (We chose to use dataArtisans' repository here
> > > because we needed a custom GitHub hook that we couldn’t add to the
> > > apache/flink repository). Benchmarking usually takes between one and
> two
> > > hours. One obvious limitation at the moment is that there is only one
> > > comparison view, with one comparison branch, so trying to compare two
> > > PRs at the same time is impossible. However we can tackle
> > > this problem once it will become a real issue, not only a theoretical
> > one.
> > >
> > > Piotrek & Nico
> > >
> >
> >
>


[jira] [Created] (FLINK-10448) VALUES clause is translated into a separate operator per value

2018-09-27 Thread Timo Walther (JIRA)
Timo Walther created FLINK-10448:


 Summary: VALUES clause is translated into a separate operator per 
value
 Key: FLINK-10448
 URL: https://issues.apache.org/jira/browse/FLINK-10448
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Timo Walther


It seems that a SQL VALUES clause uses one operator per value under certain 
conditions which leads to a complicated job graph. Given that we need to 
compile code for every operator in the open method, this looks inefficient to 
me.

For example, the following query creates and unions 6 operators together:
{code}
SELECT *
  FROM (
VALUES
  (1, 'Bob', CAST(0 AS BIGINT)),
  (22, 'Alice', CAST(0 AS BIGINT)),
  (42, 'Greg', CAST(0 AS BIGINT)),
  (42, 'Greg', CAST(0 AS BIGINT)),
  (42, 'Greg', CAST(0 AS BIGINT)),
  (1, 'Bob', CAST(0 AS BIGINT)))
AS UserCountTable(user_id, user_name, user_count)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10447) Create Bucketing Table Sink.

2018-09-27 Thread Suxing Lee (JIRA)
Suxing Lee created FLINK-10447:
--

 Summary: Create Bucketing Table Sink.
 Key: FLINK-10447
 URL: https://issues.apache.org/jira/browse/FLINK-10447
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Suxing Lee
 Fix For: 1.7.0


It would be nice to integrate the table APIs with the HDFS connectors so that 
the rows in the tables can be directly pushed into HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Handling burst I/O when using tumbling/sliding windows

2018-09-27 Thread Piotr Nowojski
Hi,

Thanks for the proposal. Could you provide more 
background/explanation/motivation why do you need such feature? What do you 
mean by “network I/O” degradation? 

On it’s own burst writes shouldn’t cause problems within Flink. If they do, we 
might want to fix the original underlying problem and if they are causing 
problems in external systems, we also might think about other approaches to 
fix/handle the problem (write rate limiting?), which might be more general and 
not fixing only bursts originating from WindowOperator. I’m not saying that 
your proposal is bad or anything, but I would just like to have more context :)

Piotrek.

> On 26 Sep 2018, at 19:21, Rong Rong  wrote:
> 
> Hi Dev,
> 
> I was wondering if there's any previous discussion regarding how to handle
> burst network I/O when deploying Flink applications with window operators.
> 
> We've recently see some significant network I/O degradation when trying to
> use sliding window to perform rolling aggregations. The pattern is very
> periodic: output connections get no traffic for a period of time until a
> burst at window boundaries (in our case every 5 minutes).
> 
> We have drafted a doc
> 
> on
> how we proposed to handle it to smooth the output traffic spikes. Please
> kindly take a look, any comments and suggestions are highly appreciated.
> 
> --
> Rong