[jira] [Created] (FLINK-10188) Solve nondeterministic functions problem for retraction

2018-08-20 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-10188:
---

 Summary: Solve nondeterministic functions problem for retraction
 Key: FLINK-10188
 URL: https://issues.apache.org/jira/browse/FLINK-10188
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Hequn Cheng


Currently, Retraction has not considered non-deterministic functions. For sql 
like:
{code}
source -> group by -> 
 non-window join -> retract_sink
source -> group by -> 
{code}
The group by will send retract messages to join. However, if we add 
LOCALTIMESTAMP between group by and join, messages can not be retracted 
correctly in join, since join retract messages according to the whole row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10187) Fix LogicalUnnestRule to match Correlate/Uncollect correctly

2018-08-20 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-10187:
--

 Summary: Fix LogicalUnnestRule to match Correlate/Uncollect 
correctly
 Key: FLINK-10187
 URL: https://issues.apache.org/jira/browse/FLINK-10187
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.6.0
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10186) FindBugs warnings: Random object created and used only once

2018-08-20 Thread Hiroaki Yoshida (JIRA)
Hiroaki Yoshida created FLINK-10186:
---

 Summary: FindBugs warnings: Random object created and used only 
once
 Key: FLINK-10186
 URL: https://issues.apache.org/jira/browse/FLINK-10186
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Hiroaki Yoshida


FindBugs-3.0.1 ([http://findbugs.sourceforge.net/]) reported a 
DMI_RANDOM_USED_ONLY_ONCE warning on master:
{code:java}
H B DMI: Random object created and used only once in new 
org.apache.flink.streaming.runtime.io.BufferSpiller(IOManager, int)  At 
BufferSpiller.java:[line 118]
{code}
The description of the bug is as follows:
{quote}*DMI: Random object created and used only once 
(DMI_RANDOM_USED_ONLY_ONCE)*
This code creates a java.util.Random object, uses it to generate one random 
number, and then discards the Random object. This produces mediocre quality 
random numbers and is inefficient. If possible, rewrite the code so that the 
Random object is created once and saved, and each time a new random number is 
required invoke a method on the existing Random object to obtain it.

If it is important that the generated Random numbers not be guessable, you must 
not create a new Random for each random number; the values are too easily 
guessable. You should strongly consider using a java.security.SecureRandom 
instead (and avoid allocating a new SecureRandom for each random number needed).
[http://findbugs.sourceforge.net/bugDescriptions.html#DMI_RANDOM_USED_ONLY_ONCE]
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10185) Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous

2018-08-20 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10185:
-

 Summary: Make ZooKeeperStateHandleStore#releaseAndTryRemove 
synchronous
 Key: FLINK-10185
 URL: https://issues.apache.org/jira/browse/FLINK-10185
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.6.0, 1.5.2, 1.4.2, 1.7.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.4.3, 1.6.1, 1.7.0, 1.5.4


The {{ZooKeeperStateHandleStore#releaseAndTryRemove}} method executes parts of 
its logic synchronously (retrieving the state handle and unlocking the ZNode) 
and others asynchronously (removing the ZNode). Moreover, the method takes a 
parameter which is used to execute some logic in case of a successful removal. 
This was done in order to execute a potentially blocking state discard 
operation in a different thread.

I think this can be simplified by executing all logic in the same thread and 
running the callback after having called 
{{ZooKeeperStateHandleStore#releaseAndTryRemove}}. Moreover, if this operation 
needs to be not blocking one could use a different thread to call into this 
method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10184) HA Failover broken due to JobGraphs not being removed from Zookeeper on cancel

2018-08-20 Thread Thomas Wozniakowski (JIRA)
Thomas Wozniakowski created FLINK-10184:
---

 Summary: HA Failover broken due to JobGraphs not being removed 
from Zookeeper on cancel
 Key: FLINK-10184
 URL: https://issues.apache.org/jira/browse/FLINK-10184
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination
Affects Versions: 1.5.2
Reporter: Thomas Wozniakowski


We have encountered a blocking issue when upgrading our cluster to 1.5.2.

It appears that, when jobs are cancelled manually (in our case with a 
savepoint), the JobGraphs are NOT removed from the Zookeeper {{jobgraphs}} node.

This means that, if you start a job, cancel it, restart it, cancel it, etc. You 
will end up with many job graphs stored in zookeeper, but none of the 
corresponding blobs in the Flink HA directory.

When a HA failover occurs, the newly elected leader retrieves all of those old 
JobGraph objects from Zookeeper, then goes looking for the corresponding blobs 
in the HA directory. The blobs are not there so the JobManager explodes and the 
process dies.

At this point the cluster has to be fully stopped, the zookeeper jobgraphs 
cleared out by hand, and all the jobmanagers restarted.

I can see the following line in the JobManager logs:

{{ 2018-08-20 16:17:20,776 INFO  
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  - Removed 
job graph 4e9a5a9d70ca99dbd394c35f8dfeda65 from ZooKeeper.
}}

But looking in Zookeeper the {{4e9a5a9d70ca99dbd394c35f8dfeda65}} job is still 
very much there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10183) Processing-Time based windows aren't emitted when a finite stream ends

2018-08-20 Thread Andrew Roberts (JIRA)
Andrew Roberts created FLINK-10183:
--

 Summary: Processing-Time based windows aren't emitted when a 
finite stream ends
 Key: FLINK-10183
 URL: https://issues.apache.org/jira/browse/FLINK-10183
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Andrew Roberts


It looks like Event-time based windows leverage a final watermark added in 
FLINK-3554, but in-progress processing-time based windows are dropped on the 
floor once a finite stream ends.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release 1.5.3, release candidate #1

2018-08-20 Thread Tzu-Li (Gordon) Tai
+1 (binding)

- verified checksum and gpg files
- verified source compiles (tests enabled), Scala 2.11 / without Hadoop
- e2e tests pass locally
- source release contains no binaries
- no missing release artifacts in staging area
- reviewed announcement PR, is LGTM

Cheers,
Gordon


On Sat, Aug 18, 2018 at 7:56 PM vino yang  wrote:

> +1,
>
> - checkout the source code of Flink v1.5.3-rc1 and packaged (skip tests)
> successfully
> - reviewed announcement blog post and gave a little comment
> - ran flink-table's test successfully
> - checked all modules' version number is 1.5.3
>
> Thanks, vino.
>
> Till Rohrmann  于2018年8月18日周六 上午12:25写道:
>
> > +1 (binding)
> >
> > - built Flink from source release with Hadoop version 2.8.4
> > - executed all end-to-end tests sucessfully
> > - executed Jepsen test suite successfully with binary release
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 16, 2018, 12:59 Chesnay Schepler  wrote:
> >
> > > Hi everyone,
> > > Please review and vote on the release candidate #1 for the version
> > > 1.5.3, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > > be deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint 11D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.5.3-rc1" [5],
> > > * website pull request listing the new release and adding announcement
> > > blog post [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12343777
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.5.3/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1179
> > > [5]
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.5.3-rc1
> > > [6] https://github.com/apache/flink-web/pull/119
> > >
> > >
> > >
> > >
> > >
> > >
> >
>


[jira] [Created] (FLINK-10182) AsynchronousBufferFileWriterTest deadlocks on travis

2018-08-20 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10182:


 Summary: AsynchronousBufferFileWriterTest deadlocks on travis
 Key: FLINK-10182
 URL: https://issues.apache.org/jira/browse/FLINK-10182
 Project: Flink
  Issue Type: Bug
  Components: Network, Tests
Affects Versions: 1.7.0
Reporter: Chesnay Schepler


https://travis-ci.org/apache/flink/jobs/415811738



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10181) Add anchor link to individual rest requests

2018-08-20 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10181:


 Summary: Add anchor link to individual rest requests
 Key: FLINK-10181
 URL: https://issues.apache.org/jira/browse/FLINK-10181
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, REST
Affects Versions: 1.7.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.7.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10180) .bat script to start Flink standalone job

2018-08-20 Thread Pavel Shvetsov (JIRA)
Pavel Shvetsov created FLINK-10180:
--

 Summary: .bat script to start Flink standalone job
 Key: FLINK-10180
 URL: https://issues.apache.org/jira/browse/FLINK-10180
 Project: Flink
  Issue Type: Sub-task
  Components: Startup Shell Scripts
Reporter: Pavel Shvetsov
Assignee: Pavel Shvetsov


Create .bat script to start Flink standalone job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10179) .bat script to start Flink history server

2018-08-20 Thread Pavel Shvetsov (JIRA)
Pavel Shvetsov created FLINK-10179:
--

 Summary: .bat script to start Flink history server
 Key: FLINK-10179
 URL: https://issues.apache.org/jira/browse/FLINK-10179
 Project: Flink
  Issue Type: Sub-task
  Components: Startup Shell Scripts
Reporter: Pavel Shvetsov
Assignee: Pavel Shvetsov


Create .bat script to start Flink history server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10178) Job cancel REST API not working

2018-08-20 Thread Marc Rooding (JIRA)
Marc Rooding created FLINK-10178:


 Summary: Job cancel REST API not working
 Key: FLINK-10178
 URL: https://issues.apache.org/jira/browse/FLINK-10178
 Project: Flink
  Issue Type: Bug
  Components: REST
Affects Versions: 1.6.0, 1.5.2, 1.5.1
 Environment: Running as a Docker container using the default flink 
Docker images.
Reporter: Marc Rooding


I've been trying to work with the Flink REST API to cancel a running job. 

When I deploy a simple job I can retrieve the Job information using the 
*jobs/overview* endpoint which returns:
{code:java}
{
  "jobs": [
{
  "jid": "f907f847451cfd9231b7d3c0662b149b",
  "name": "Windowed WordCount",
  "state": "RUNNING",
  "start-time": 1534770489437,
  "end-time": -1,
  "duration": 72682,
  "last-modification": 1534770489942,
  "tasks": {
"total": 6,
"created": 0,
"scheduled": 0,
"deploying": 0,
"running": 6,
"finished": 0,
"canceling": 0,
"canceled": 0,
"failed": 0,
"reconciling": 0
  }
}
  ]
}
{code}
I can also request more information about the job using the 
*jobs/f907f847451cfd9231b7d3c0662b149b* endpoint.

 

According to the documentation, I should be able to do a DELETE request to 
*jobs/f907f847451cfd9231b7d3c0662b149b/cancel.* Doing so returns a 404 with the 
following body:

 
{code:java}
{"errors":["Not found."]}{code}
 

I also tried a GET request to 
*jobs/**f907f847451cfd9231b7d3c0662b149b/cancel-with-savepoint/.* That simply 
returns a 404 without a body.**

I tried it with Flink 1.5.1, 1.5.2 and 1.6.0 and get consistent results.

I looked at which API is being used by the Flink web UI when pressing the 
'Cancel' button and that one seems to go to 
*jobs/**f907f847451cfd9231b7d3c0662b149b/yarn-cancel*. If I do a GET request to 
that one manually it does actually cancel the job.

I've been looking into the Flink source code but couldn't find an immediate 
clue to what's going on. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-20 Thread Artsem Semianenka
I have an idea to create new version of HadoopRecoverableFsDataOutputStream
class (for example with name LegacyHadoopRecoverableFsDataOutputStream :) )
which will works with valid-length files without invoking truncate. And
modify check in HadoopRecoverableWriter to use
LegacyHadoopRecoverableFsDataOutputStream in case if Hadoop version is
lower then 2.7 . I will try to provide PR soon if no objections. I hope I
am on the right way.

On Mon, 20 Aug 2018 at 14:40, Artsem Semianenka 
wrote:

> Hi guys !
> I have a question regarding new StreamingFileSink (introduced in 1.6
> version) . We use this sink to write data into Parquet format. But I faced
> with issue when trying to run job on Yarn cluster and save result to HDFS.
> In our case we use latest Cloudera distributive (CHD 5.15) and it contains
> HDFS 2.6.0  . This version is not support truncate method . I would like to
> create Pull request but I want to ask your advice how better design this
> fix and which ideas are behind this decision . I saw similiar PR for
> BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I could
> also add support of valid-length files for older Hadoop versions ?
>
> P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of
> Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop .
>
> Best regards,
> Artsem
>


-- 

С уважением,
Артем Семененко


Re: [DISCUSS][TABLE] How to handle empty delete for UpsertSource

2018-08-20 Thread Piotr Nowojski
Hi,

Thanks for bringing up this issue here.

I’m not sure whether sometimes swallowing empty deletes could be a problem or 
always swallowing/forwarding them is better. I guess for most use cases it 
doesn't matter. Maybe the best for now would be to always forward them, since 
if they are a problem, user could handle them somehow, either in custom sink 
wrapper or in system that’s downstream from Flink. Also maybe we could have 
this configurable in the future.

However this thing seems to me like a much lower priority compared to 
performance implications. Forcing upsert source to always keep all of the keys 
on the state is not only costly, but in many cases it can be a blocker from 
executing a query at all. Not only for the UpsertSource -> Calc -> UpsertSink, 
but also for example in the future for joins or ORDER BY (especially with 
LIMIT) as well.

I would apply same reasoning to FLINK-9528.

Piotrek

> On 19 Aug 2018, at 08:21, Hequn Cheng  wrote:
> 
> Hi all,
> 
> Currently, I am working on FLINK-8577 Implement proctime DataStream to
> Table upsert conversion .
> And a design doc can be found here
> .
> It received many valuable suggestions. Many thanks to all of you.
> However there are some problems I think may need more discussion.
> 
> *Terms*
> 
>   1. *Upsert Stream:* Stream that include a key definition and will be
>   updated. Message types include insert, update and delete.
>   2. *Upsert Source:* Source that ingest Upsert Stream.
>   3. *Empty Delete:* For a specific key, the first message is a delete
>   message.
> 
> *Problem to be discussed*
> How to handle empty deletes for UpsertSource?
> 
> *Ways to solve the problem*
> 
>   1. Throw away empty delete messages in the UpsertSource(personally in
>   favor of this option)
>  - advantages
>  - This makes sense in semantics. An empty table + delete message is
> still an empty table. Losing deletes does not affect the final 
> results.
> - At present, the operators or functions in flink are assumed to
> process the add message first and then delete. Throw away
> empty deletes in
> source, so that the downstream operators do not need to
> consider the empty
> deletes.
> - disadvantages
>  - Maintaining the state in source is expensive, especially for some
> simple sql like: UpsertSource -> Calc -> UpsertSink.
> 2. Throw away empty delete messages when source generate
>   retractions, otherwise pass empty delete messages down
>  - advantages
>  - Downstream operator does not need to consider empty delete messages
> when the source generates retraction.
> - Performance is better since source don't have to maintain state
> if it doesn't generate retractions.
> - disadvantages
>  - The judgment that whether the downstream operator will receive
> empty delete messages is complicated. Not only take source into
> consideration, but also should consider the operators that
> are followed by
> source. Take join as an example, for the sql: upsert_source
> -> upsert_join,
> the join receives empty deletes while in sql(upsert_source ->
> group_by ->
> upsert_join), the join doesn't since empty deletes are ingested by
> group_by.
> - The semantics of how to process empty deletes are not clear.
> Users may be difficult to understand, because sometimes empty
> deletes are
> passed down, but sometimes don't.
> 3. Pass empty delete messages down always
>  - advantages
>  - Performance is better since source don't have to maintain state if
> it doesn't generate retractions.
> - disadvantages
>  - All table operators and functions in flink need to consider empty
> deletes.
> 
> *Another related problem*
> Another related problem is FLINK-9528 Incorrect results: Filter does not
> treat Upsert messages correctly
>  which I think should be
> considered together.
> The problem in FLINK-9528 is, for sql like upsert_source -> filter ->
> upsert_sink, when the data of a key changes from non-filtering to
> filtering, the filter only removes the upsert message such that the
> previous version remains in the result.
> 
>   1. One way to solve the problem is to make UpserSource generates
>   retractions.
>   2. Another way is to make a filter aware of the update semantics
>   (retract or upsert) and convert the upsert message into a delete message if
>   the predicate evaluates to false.
> 
> The second way will also generate many empty delete messages. To avoid too
> many empty deletes, the solution is to maintain a filter state at sink to
> prevent the empty deletes from causing devastating pressure on the physical
> database. However, if U

Support Hadoop 2.6 for StreamingFileSink

2018-08-20 Thread Artsem Semianenka
Hi guys !
I have a question regarding new StreamingFileSink (introduced in 1.6
version) . We use this sink to write data into Parquet format. But I faced
with issue when trying to run job on Yarn cluster and save result to HDFS.
In our case we use latest Cloudera distributive (CHD 5.15) and it contains
HDFS 2.6.0  . This version is not support truncate method . I would like to
create Pull request but I want to ask your advice how better design this
fix and which ideas are behind this decision . I saw similiar PR for
BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I could
also add support of valid-length files for older Hadoop versions ?

P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of
Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop .

Best regards,
Artsem


SQL Client Limitations

2018-08-20 Thread Dominik Wosiński
Hey,

Do we have any list of current limitations of SQL Client available
somewhere or the only way is to go through JIRA issues?

For example:
I tried to make Group By Tumble Window and Inner Join in one query and it
seems that it is not possible currently and I was wondering whether it's
and issue with my query or known limitation.

Thanks,
Best Regards,
Dominik.


[jira] [Created] (FLINK-10177) Use transport type AUTO by default

2018-08-20 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-10177:
---

 Summary: Use transport type AUTO by default
 Key: FLINK-10177
 URL: https://issues.apache.org/jira/browse/FLINK-10177
 Project: Flink
  Issue Type: Improvement
  Components: Configuration, Network
Affects Versions: 1.6.0, 1.7.0
Reporter: Nico Kruber


Now that the shading issue with the native library is fixed (FLINK-9463), EPOLL 
should be available on (all?) Linux distributions and provide some efficiency 
gain (if enabled). Therefore, {{taskmanager.network.netty.transport}} should be 
set to {{auto}} by default. If EPOLL is not available, it will automatically 
fall back to NIO which currently is the default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Rust add adapter for parquet

2018-08-20 Thread Renjie Liu
This thread is for arrow, I'm sorry for sending it to flink community by
mistake.

On Mon, Aug 20, 2018 at 3:57 PM Fabian Hueske  wrote:

> Hi Renjie,
>
> Did you intend to send this mail to dev@arrow.a.o instead of dev@flink.a.o
> ?
>
> Best, Fabian
>
> 2018-08-20 4:39 GMT+02:00 Renjie Liu :
>
> > cc:Sunchao and Any
> >
> > -- Forwarded message -
> > From: Uwe L. Korn 
> > Date: Sun, Aug 19, 2018 at 5:08 PM
> > Subject: Re: [DISCUSS] Rust add adapter for parquet
> > To: 
> >
> >
> > Hello,
> >
> > you might also want to raise this with the
> > https://github.com/sunchao/parquet-rs project. The overlap between the
> > developers of this project and the Arrow Rust implementation is quite
> large
> > but still it may make sense to also start a discussion there.
> >
> > Uwe
> >
> > On Thu, Aug 16, 2018, at 9:14 AM, Renjie Liu wrote:
> > > Hi, all:
> > >
> > > Now the rust component is approaching a stable state and rust reader
> for
> > > parquet is ready. I think it maybe a good time to start an adapter for
> > > parquet, just like adapter for orc in cpp. How you guys think about it?
> > > --
> > > Liu, Renjie
> > > Software Engineer, MVAD
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
>
-- 
Liu, Renjie
Software Engineer, MVAD


[jira] [Created] (FLINK-10176) Replace ByteArrayData[Input|Output]View with Data[Output|InputDe]Serializer

2018-08-20 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10176:
--

 Summary: Replace ByteArrayData[Input|Output]View with 
Data[Output|InputDe]Serializer
 Key: FLINK-10176
 URL: https://issues.apache.org/jira/browse/FLINK-10176
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.7.0
Reporter: Stefan Richter
Assignee: Stefan Richter


I have found that the functionality of {{ByteArrayData[Input|Output]View}} is 
very similar to the already existing {{Data[Output|InputDe]Serializer}}. With 
some very small additions, we can replace the former with the later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10175) Fix concurrent access to shared buffer in map state / querable state

2018-08-20 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10175:
--

 Summary: Fix concurrent access to shared buffer in map state / 
querable state
 Key: FLINK-10175
 URL: https://issues.apache.org/jira/browse/FLINK-10175
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.7.0
Reporter: Stefan Richter
Assignee: Stefan Richter


Accidental sharing of buffers between event processing loop and queryable state 
thread can happen in {{RocksDBMapState::deserializeUserKey}}. Queryable state 
should provide a separate buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Proposal] Utilities for reading, transforming and creating Streaming savepoints

2018-08-20 Thread Aljoscha Krettek
+1 I'd like to have something like this in Flink a lot!

> On 19. Aug 2018, at 11:57, Gyula Fóra  wrote:
> 
> Hi all!
> 
> Thanks for the feedback and I'm happy there is some interest :)
> Tomorrow I will start improving the proposal based on the feedback and will
> get back to work.
> 
> If you are interested working together in this please ping me and we can
> discuss some ideas/plans and how to share work.
> 
> Cheers,
> Gyula
> 
> Paris Carbone  ezt írta (időpont: 2018. aug. 18., Szo, 9:03):
> 
>> +1
>> 
>> Might also be a good start to implement queryable stream state with
>> snapshot isolation using that mechanism.
>> 
>> Paris
>> 
>>> On 17 Aug 2018, at 12:28, Gyula Fóra  wrote:
>>> 
>>> Hi All!
>>> 
>>> I want to share with you a little project we have been working on at King
>>> (with some help from some dataArtisans folks). I think this would be a
>>> valuable addition to Flink and solve a bunch of outstanding production
>>> use-cases and headaches around state bootstrapping and state analytics.
>>> 
>>> We have built a quick and dirty POC implementation on top of Flink 1.6,
>>> please check the README for some nice examples to get a quick idea:
>>> 
>>> https://github.com/king/bravo
>>> 
>>> *Short story*
>>> Bravo is a convenient state reader and writer library leveraging the
>>> Flink’s batch processing capabilities. It supports processing and writing
>>> Flink streaming savepoints. At the moment it only supports processing
>>> RocksDB savepoints but this can be extended in the future for other state
>>> backends and checkpoint types.
>>> 
>>> Our goal is to cover a few basic features:
>>> 
>>>  - Converting keyed states to Flink DataSets for processing and
>> analytics
>>>  - Reading/Writing non-keyed operators states
>>>  - Bootstrap keyed states from Flink DataSets and create new valid
>>>  savepoints
>>>  - Transform existing savepoints by replacing/changing some states
>>> 
>>> 
>>> Some example use-cases:
>>> 
>>>  - Point-in-time state analytics across all operators and keys
>>>  - Bootstrap state of a streaming job from external resources such as
>>>  reading from database/filesystem
>>>  - Validate and potentially repair corrupted state of a streaming job
>>>  - Change max parallelism of a job
>>> 
>>> 
>>> Our main goal is to start working together with other Flink production
>>> users and make this something useful that can be part of Flink. So if you
>>> have use-cases please talk to us :)
>>> I have also started a google doc which contains a little bit more info
>> than
>>> the readme and could be a starting place for discussions:
>>> 
>>> 
>> https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing
>>> 
>>> I know there are a bunch of rough edges and bugs (and no tests) but our
>>> motto is: If you are not embarrassed, you released too late :)
>>> 
>>> Please let me know what you think!
>>> 
>>> Cheers,
>>> Gyula
>> 
>> 



Re: [DISCUSS] Rust add adapter for parquet

2018-08-20 Thread Fabian Hueske
Hi Renjie,

Did you intend to send this mail to dev@arrow.a.o instead of dev@flink.a.o?

Best, Fabian

2018-08-20 4:39 GMT+02:00 Renjie Liu :

> cc:Sunchao and Any
>
> -- Forwarded message -
> From: Uwe L. Korn 
> Date: Sun, Aug 19, 2018 at 5:08 PM
> Subject: Re: [DISCUSS] Rust add adapter for parquet
> To: 
>
>
> Hello,
>
> you might also want to raise this with the
> https://github.com/sunchao/parquet-rs project. The overlap between the
> developers of this project and the Arrow Rust implementation is quite large
> but still it may make sense to also start a discussion there.
>
> Uwe
>
> On Thu, Aug 16, 2018, at 9:14 AM, Renjie Liu wrote:
> > Hi, all:
> >
> > Now the rust component is approaching a stable state and rust reader for
> > parquet is ready. I think it maybe a good time to start an adapter for
> > parquet, just like adapter for orc in cpp. How you guys think about it?
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> --
> Liu, Renjie
> Software Engineer, MVAD
>


[jira] [Created] (FLINK-10174) getbytes with no charsets test error for hex and toBase64

2018-08-20 Thread xueyu (JIRA)
xueyu created FLINK-10174:
-

 Summary: getbytes with no charsets test error for hex and toBase64
 Key: FLINK-10174
 URL: https://issues.apache.org/jira/browse/FLINK-10174
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: xueyu
 Fix For: 1.7.0


Hex and toBase64 builtin method use str.getBytes() with no Charset. It maybe 
depend on local execution environment for special Unicode and maybe result in 
errors when test Hex for special Unicode





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)