Yes, but be aware that your program runs with parallelism 1 if you do not
configure the parallelism.
2016-05-03 11:07 GMT+02:00 Punit Naik :
> Hi Stephen, Fabian
>
> setting "fs.output.always-create-directory" to true in flink-config.yml
> worked!
>
> On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen
tors are not supposed to be implemented outside of
> Flink.
>
> Thanks,
>
> Simone
>
> 2016-04-29 21:32 GMT+02:00 Fabian Hueske :
>
>> Hi Simone,
>>
>> the GraphCreatingVisitor transforms the common operator plan into a
>> representation that is tr
Hi Flavio,
I thought a bit about your proposal. I am not sure if it is actually
necessary to integrate a central source repository into Flink. It should be
possible to offer this as an external service which is based on the
recently added TableSource interface. TableSources could be extended to be
I'm not so much familiar with the Kafka connector.
Can you post your suggestion to the user or dev mailing list?
Thanks, Fabian
2016-05-04 16:53 GMT+02:00 Sendoh :
> Glad to see it's developing.
> Can I ask would the same feature (reconnect) be useful for Kafka connector
> ?
> For example, if th
Sorry, I confused the mail threads. We're already on the user list :-)
Thanks for the suggestion.
2016-05-04 17:35 GMT+02:00 Fabian Hueske :
> I'm not so much familiar with the Kafka connector.
> Can you post your suggestion to the user or dev mailing list?
>
> Thanks, Fa
Hi Max,
it is not possible to deactivate spilling to disk at the moment.
It might be possible to implement, but this would require a few more
changes to make it feasible.
For instance, we would need to add more fine-grained control about how
memory is distributed among operators.
This is currently
define a link from operators to TableEnvironment and then to TableSource
> (using the lineage tag/source-id you said) and, finally to its metadata. I
> don't know whether this is specific only to us, I just wanted to share our
> needs and see if the table API development could b
Hi Andrea,
you can use any OutputFormat to emit data from a DataStream using the
writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of
a failure, it might emit some records a second time. Hence the results will
be written at least once.
Hope
Hi Palle,
this sounds indeed like a good use case for Flink.
Depending on the complexity of the aggregated historical views, you can
implement a Flink DataStream program which builds the views on the fly,
i.e., you do not need to periodically trigger MR/Flink/Spark batch jobs to
compute the views
iate into
> RichSinkFunction's open method)
>
> Am I wrong?
>
> Thanks again,
> Andrea
>
> 2016-05-06 13:47 GMT+02:00 Fabian Hueske :
>
>> Hi Andrea,
>>
>> you can use any OutputFormat to emit data from a DataStream using the
>> writeUsingOut
gt; What do you think? Is there the possibility to open a broadcasted Dataset
> as a Map instead of a List?
>
> Best,
> Flavio
>
>
> On Fri, May 6, 2016 at 12:06 PM, Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> I'll open a JIRA for de/
Hi Palle,
you can recursively read all files in a folder as explained in the
"Recursive Traversal of the Input Path Directory" section of the Data
Source documentation [1].
The easiest way to read line-wise JSON objects is to use
ExecutionEnvironment.readTextFile() which reads text files linewise
id, I'm still not sure if it is really required to implement a
> custom runtime operator but given the complexity of the integration of two
> distribute systems, we assumed that more control would allow more
> flexibility and possibilities to achieve an ideal solution.
>
>
>
&g
Maybe the last example of this blog post is helpful [1].
Best, Fabian
[1]
https://www.mapr.com/blog/essential-guide-streaming-first-processing-apache-flink
2016-05-10 17:24 GMT+02:00 Srikanth :
> Hi,
>
> I read the following in Flink doc "We can explicitly specify a Trigger to
> overwrite the d
ion or at
> least support us in the development. We are waiting for them to start a
> discussion and as soon as we will have a more clear idea on how to proceed,
> we will validate it with the stuff you just said. Your confidence in
> Flink's operators gives up hope to achieve a clean
Hi Martin,
You can use a FoldFunction and a WindowFunction to process the same!
window. The FoldFunction is eagerly applied, so the window state is only
one element. When the window is closed, the aggregated element is given to
the WindowFunction where you can add start and end time. The iterator
Hi Tarandeep,
the AvroInputFormat was recently extended to support GenericRecords. [1]
You could also try to run the latest SNAPSHOT version and see if it works
for you.
Cheers, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-3691
2016-05-12 10:05 GMT+02:00 Tarandeep Singh :
> I think I
Hi,
Flink's exactly-once semantics do not mean that events are processed
exactly-once but that events will contribute exactly-once to the state of
an operator such as a counter.
Roughly, the mechanism works as follows:
- Flink peridically injects checkpoint markers into the data stream. This
happe
master/apis/streaming/fault_tolerance.html#fault-tolerance-guarantees-of-data-sources-and-sinks
>
> If not what other Sinks can I use to have the exactly once output since
> getting exactly once output is critical for our use case.
>
>
>
> Thanks,
> Naveen
>
> From: Fabian Hueske
>
t; pipeline.
>
>
>
> Thanks,
> Naveen
>
> From: Fabian Hueske
> Reply-To: "user@flink.apache.org"
> Date: Friday, May 13, 2016 at 4:26 PM
>
> To: "user@flink.apache.org"
> Subject: Re: Flink recovery
>
> Hi Naveen,
>
> the Ro
Hi Prateek,
the missing numbers are an artifact from how the stats are collected.
ATM, Flink does only collect these metrics for data which is sent over
connections *between* Flink operators.
Since sources and sinks connect to external systems (and not Flink
operators), the dash board does not sho
".valid-length" file.
>>>
>>> The fix you mentioned is part of later Flink releases (like 1.0.3)
>>>
>>> Stephan
>>>
>>>
>>> On Mon, May 16, 2016 at 11:46 PM, Madhire, Naveen <
>>> naveen.madh...@capitalone
I think union is what you are looking for.
Note that all data sets must be of the same type.
2016-05-18 16:15 GMT+02:00 Ritesh Kumar Singh :
> Hi,
>
> How can I perform a reduce operation on a group of datasets using Flink?
> Let's say my map function gives out n datasets: d1, d2, ... dN
> Now I
I think that sentence is misleading and refers to the internals of Flink.
It should be removed, IMO.
You can only union two DataSets. If you want to union more, you have to do
it one by one.
Btw. union does not cause additional processing overhead.
Cheers, Fabian
2016-05-19 14:44 GMT+02:00 Rites
The problem seems to occur quite often.
Did you update your Flink version recently? If so, could you try to
downgrade and see if the problem disappears.
Is it otherwise possible that it is cause by faulty hardware?
2016-05-20 18:05 GMT+02:00 Flavio Pompermaier :
> This time (Europed instead of E
Actually, the program works correctly (according to the DataStream API)
Let me explain what happens:
1) You do not initialize the count variable, so it will be 0 (summing 0s
results in 0)
2) DataStreams are considered to be unbound (have an infinite size). KeyBy
does not group the records because
Hi Kirsti,
I'm not aware of anybody working on this issue.
Would you like to create a JIRA issue for it?
Best, Fabian
2016-05-23 16:56 GMT+02:00 KirstiLaurila :
> Is there any plans to implement this kind of feature (possibility to write
> to
> data specified partitions) in the near future?
>
>
No, that is not supported yet.
Beam provides a common API but the Flink runner translates programs against
batch sources into the DataSet API programs and Beam programs against
streaming source into DataStream programs.
It is not possible to mix both.
2016-05-26 10:00 GMT+02:00 Ashutosh Kumar :
>
Hi Elias,
yes, reduce, fold, and the aggregation functions (sum, min, max, minBy,
maxBy) on WindowedStream preform eager aggregation, i.e., the functions are
apply for each value that enters the window and the state of the window
will consist of a single value. In case you need access to the Windo
Hi Tarandeep,
the exception suggests that Flink tries to serialize RecordsFilterer as a
user function (this happens via Java Serialization).
I said suggests because the code that uses RecordsFilterer is not included.
To me it looks like RecordsFilterer should not be used as a user function.
It is
Hi Elias,
thanks for your feedback. I think those are good observations and
suggestions to improve the Kafka producers.
The best place to discuss such improvements is the dev mailing list.
Would like to repost your mail there or open JIRAs where the discussion
about these changes can continue?
T
Hi Ravikumar,
I'll try to answer your questions:
1) If you set the parallelism of a map function to 1, there will be only a
single instance of that function regardless whether it is execution locally
or remotely in a cluster.
2) Flink does also support aggregations, (reduce, groupReduce, combine,
Hi Yukun,
the problem is that the KeySelector is internally invoked multiple times.
Hence it must be deterministic, i.e., it must extract the same key for the
same object if invoked multiple times.
The documentation is not discussing this aspect and should be extended.
Thanks for pointing out thi
Hi,
you are computing a running aggregate, i.e., you're getting one output
record for each input record and the output record is the record with the
largest value observed so far.
If the record with the largest value is the first, the record is sent out
another time. This is what happened with Mat
Hi Christophe,
where does the backpressure appear? In front of the sink operator or before
the window operator?
In any case, I think you can improve your WindowFunction if you convert
parts of it into a FoldFunction.
The FoldFunction would take care of the statistics computation and the
WindowFun
ion Cleared.
>
> 4) My question was can I use same ExecutionEnvironment for all flink
> programs in a module.
>
> 5) Question Cleared.
>
>
> Regards
> Ravikumar
>
>
>
> On 9 June 2016 at 17:58, Fabian Hueske wrote:
>
>> Hi Ravikumar,
>>
>> I
We solved this problem yesterday at the Flink Hackathon.
The issue was that the source function was started with parallelism 4 and
each function read the whole file.
Cheers, Fabian
2016-06-06 16:53 GMT+02:00 Biplob Biswas :
> Hi,
>
> I tried streaming the source data 2 ways
>
> 1. Is a simple st
Great, thank you!
2016-06-09 17:38 GMT+02:00 Elias Levy :
>
> On Thu, Jun 9, 2016 at 5:16 AM, Fabian Hueske wrote:
>
>> thanks for your feedback. I think those are good observations and
>> suggestions to improve the Kafka producers.
>> The best place to discuss s
rwyck <
christophe.salperw...@gmail.com>:
> Hi Fabian,
>
> Thanks for the help, I will try that. The backpressure was on the source
> (HBase).
>
> Christophe
>
> 2016-06-09 16:38 GMT+02:00 Fabian Hueske :
>
>> Hi Christophe,
>>
>> where does the backpre
necessary since
> I don't really care about keys.
>
> On 9 June 2016 at 22:00, Fabian Hueske wrote:
>
>> Hi Yukun,
>>
>> the problem is that the KeySelector is internally invoked multiple times.
>> Hence it must be deterministic, i.e., it must extract the
nk 1.x release line.
>> >>
>> >> Cheers,
>> >> Aljoscha
>> >>
>> >> On Mon, 13 Jun 2016 at 11:04 Christophe Salperwyck
>> >> wrote:
>> >>>
>> >>> Thanks for the feedback and sorry that I can't try
Hi Josh,
I assume that you build the SNAPSHOT version yourself. I had similar
version conflicts for Apache HttpCore with Flink SNAPSHOT versions on EMR.
The problem is cause by a changed behavior in Maven 3.3 and following
versions.
Due to these changes, the dependency shading is not working corre
Yes, that was my fault. I'm used to auto reply-all on my desktop machine,
but my phone just did a simple reply.
Sorry for the confusion,
Fabian
2016-06-29 19:24 GMT+02:00 Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr>:
> Thank you, Aljoscha!
> I received a similar update from Fabian, o
Thanks Ufuk and everybody who contributed to the release!
Cheers, Fabian
2016-08-08 20:41 GMT+02:00 Henry Saputra :
> Great work all. Great Thanks to Ufuk as RE :)
>
> On Monday, August 8, 2016, Stephan Ewen wrote:
>
> > Great work indeed, and big thanks, Ufuk!
> >
> > On Mon, Aug 8, 2016 at 6:
Hi Markus,
you might be right, that a lot of time is spend in optimization.
The optimizer enumerates all alternatives and chooses the plan with the
least estimated cost. The degrees of freedom of the optimizer are rather
restricted (execution strategies and the used partitioning & sorting keys.
Th
Hi Niels,
yes, in YARN mode, the default parallelism is the number of available slots.
You can change the default task parallelism like this:
1) Use the -p parameter when submitting a job via the CLI client [1]
2) Set a parallelism on the execution environment: env.setParallelism()
Best, Fabian
Hi Flavio,
yes, Joda should not be excluded.
This will be fixed in Flink 1.1.2.
Cheers, Fabian
2016-08-29 11:00 GMT+02:00 Flavio Pompermaier :
> Hi to all,
> I've tried to upgrade from Flink 1.0.2 to 1.1.1 so I've copied the
> excludes of the maven shade plugin from the java quickstart pom bu
Hi Paul,
This blog post [1] includes an example of an early trigger that should
pretty much do what you are looking for.
This one [2] explains the windowing mechanics of Flink (window assigner,
trigger, function, etc).
Hope this helps,
Fabian
[1]
https://www.mapr.com/blog/essential-guide-streami
Hi Steffen,
this looks like a Guava version mismatch to me.
Are you running exactly the same program on your local machine or did you
add dependencies to run it on the cluster (e.g. Kinesis).
Maybe Kinesis and Elasticsearch are using different Guava versions?
Best, Fabian
2016-09-01 10:45 GMT+02
us windows will not purge, is that correct?
>
> final DataStream alertingMsgs = keyedStream
> .window(TumblingEventTimeWindows.of(Time.minutes(1)))
> .trigger(CountTrigger.of(1))
> .apply(new MyWindowProcessor());
>
> Paul
&
state
>> backend since the state is not gone after checkpointing ?
>>
>> P.S I have kept the watermark behind by 1500 secs just to be safe on
>> handling late elements but to tackle edge case scenarios like the one
>> mentioned above we are having a backup plan of using Ca
the data to local TM disk,
> the retrieval will be faster here than Cassandra , right ?
>
> What do you think ?
>
>
> Regards,
> Vinay Patil
>
> On Thu, Sep 1, 2016 at 10:19 AM, Fabian Hueske-2 [via Apache Flink User
> Mailing List archive.] <[hidden email]
> <
ng and
> flatmap to take care of rest logic.
>
> Or are you suggestion to use Co-FlatMapFunction after the outer-join
> operation (I mean after doing the window and
> getting matchingAndNonMatching stream )?
>
> Regards,
> Vinay Patil
>
> On Thu, Sep 1, 2016 at 1
Hi Paul,
BoundedOutOfOrdernessTimestampExtractor implements the
AssignerWithPeriodicWatermarks interface.
This means, Flink will ask the assigner in regular intervals (configurable
via StreamExecutionEnvironment.getConfig().setAutoWatermarkInterval()) for
the current watermark.
The watermark will
rm in storing the
> DTO ?
>
> I think the documentation should specify the point that the state will be
> maintained for user-defined operators to avoid confusion.
>
> Regards,
> Vinay Patil
>
> On Thu, Sep 1, 2016 at 1:12 PM, Fabian Hueske-2 [via Apache Flink User
>
entTime - lastWaterMarkTime. So if (maxEventTime
> - lastWaterMarkTime) > x * 1000 then the window is evaluated?
>
>
> Paul
> ------
> *From:* Fabian Hueske
> *Sent:* Thursday, September 1, 2016 1:25:55 PM
> *To:* user@flink.apache.org
> *Subje
Thanks for the suggestion Vishnu!
Stackoverflow documentation looks great. I like the easy contribution and
versioning features.
However, I am a bit skeptical. IMO, Flink's primary documentation must be
hosted by Apache. Out-sourcing such an important aspect of a project to an
external service is
Hi,
Flink does not provide shared state.
However, you can broadcast a stream to CoFlatMapFunction, such that each
operator has its own local copy of the state.
If that does not work for you because the state is too large and if it is
possible to partition the state (and both streams), you can als
r.
> Is there a way?
>
> Best Regards
> CVP
>
> On Wed, Sep 7, 2016 at 5:00 PM, Fabian Hueske wrote:
>
>> Hi,
>>
>> Flink does not provide shared state.
>> However, you can broadcast a stream to CoFlatMapFunction, such that each
>> operator h
s job1 ?
>
>
>
> On Wed, Sep 7, 2016 at 6:33 PM, Fabian Hueske wrote:
>
>> Is writing DataStream2 to a Kafka topic and reading it from the other job
>> an option?
>>
>> 2016-09-07 19:07 GMT+02:00 Chakravarthy varaga
>> :
>>
>>> Hi Fabian,
keys' from DS2 and DS2 could shrink/expand in terms of the no., of
> keys will the key-value shard work in this case?
>
> On Wed, Sep 7, 2016 at 7:44 PM, Fabian Hueske wrote:
>
>> Operator state is always local in Flink. However, with key-value state,
>> you can h
Hi Pushpendra,
1. Queryable state is an upcoming feature and not part of an official
release yet. With queryable state you can query operator state from outside
the application.
2. Have you had a look at the CoFlatMap operator? This operator "connects"
two streams and allows to have state which i
I would assign timestamps directly at the source.
Timestamps are not striped of by operators.
Reassigning timestamps somewhere in the middle of a job can cause very
unexpected results.
2016-09-08 9:32 GMT+02:00 Dong-iL, Kim :
> Thanks for replying. pushpendra.
> The assignTimestamp method return
Hi Frank,
input should be of DataSet[(BSONWritable, BSONWritable)], so a
Tuple2[BSONWritable, BSONWritable], right?
Something like this should work:
input.map( pair => pair._1.toString )
Pair is a Tuple2[BSONWritable, BSONWritable], and pair._1 accesses the key
of the pair.
Alternatively you c
scala.reflect.ClassTag[R])org.apache.flink.api.scala.
> DataSet[R]
> match expected type ?
>
> Thanks!
> Frank
>
>
> On Thu, Sep 8, 2016 at 6:56 PM, Fabian Hueske wrote:
>
>> Hi Frank,
>>
>> input should be of DataSet[(BSONWritable, BSONWritable)], so
g and
> generates key1:, key2:, key3: keyN:
>
> Now,
> I wish to map elementKeyStream with look ups within (key1,
> key2...keyN) where key1, key2.. keyN and their respective values should be
> available across the cluster...
>
> Thanks a million !
> CVP
>
>
+1
I ran into that issue as well. Would be great to have that in the docs!
2016-09-09 11:49 GMT+02:00 Robert Metzger :
> Hi Steffen,
>
> I think it would be good to add it to the documentation.
> Would you like to open a pull request?
>
>
> Regards,
> Robert
>
>
> On Mon, Sep 5, 2016 at 10:26 PM,
No, this is not possible unless you use an external service such as a
database.
The assigners might run on different machines and Flink does not provide
utilities for r/w shared state.
Best, Fabian
2016-09-15 20:17 GMT+02:00 Saiph Kappa :
> And is it possible to share state across parallel insta
maintained in local disk even after checkpointing.
>
> Or I am not getting it correclty :)
>
> Regards,
> Vinay Patil
>
> On Thu, Sep 1, 2016 at 1:38 PM, Fabian Hueske-2 [via Apache Flink User
> Mailing List archive.] <[hidden email]
> <http:///user/SendEmail.jtp?type=node&a
Hi Radu,
you can pass the TypeInfo directly without accessing the TypeClass.
Have you tried this?
TypeInformation> tpinf = new TypeHint>(){}.getTypeInfo();
.toDataStream( , tpinf )
Best, Fabian
2016-09-19 17:53 GMT+02:00 Radu Tudoran :
> Hi,
>
>
>
> I am trying to create an sql statement th
Hi Janardhan,
to sure what's going wrong here. Maybe Till (in CC) has an idea?
Best, Fabian
2016-09-19 19:45 GMT+02:00 Janardhan Reddy :
> HI,
>
> I cancelled a restarting job from flink UI and the job is stuck in
> cancelling state. (Fixed delay restart strategy was configured for the
> job).
Thanks for looking into this Frank!
I opened FLINK-4636 [1] to track the issue.
Would you or Jaxbihani like to contribute a patch for this bug?
[1] https://issues.apache.org/jira/browse/FLINK-4636
2016-09-17 21:15 GMT+02:00 Frank Dekervel :
> Hello,
>
> looks like a bug ... when a PriorityQueu
Hi Luis,
this looks like a bug.
Can you open a JIRA [1] issue and provide a more detailed description of
what you do (Environment, DataStream / DataSet, how do you submit the
program, maybe add a small program that reproduce the problem on your
setup)?
Thanks, Fabian
2016-09-19 17:30 GMT+02:00 L
Hi Yukun,
I debugged this issue and found that this is a bug in the serialization of
the StateDescriptor.
I have created FLINK-4640 [1] to resolve the issue.
Thanks for reporting the issue.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-4640
2016-09-20 10:35 GMT+02:00 Yukun Guo :
Yes, the condition needs to be fixed.
@Swapnil, would you like to create a JIRA issue and open a pull request to
fix it?
Thanks, Fabian
2016-09-20 11:22 GMT+02:00 Chesnay Schepler :
> I would agree that the condition should be changed.
>
>
> On 20.09.2016 10:52, Swapnil Chougule wrote:
>
>> I c
Hi Yassine, can you share a stacktrace of the job when it got stuck?
Thanks, Fabian
2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI :
> The input splits are correctly assgined. I noticed that whenever the job
> is stuck, that is because the task *Combine (GroupReduce at
> first(DataSet.java:573)) *
:584)
> at java.lang.Thread.run(Thread.java:745)
>
> Best,
> Yassine
>
>
> 2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI :
>
>> Hi Fabian,
>>
>> Is it different from the output I already sent? (see attached file). If
>> yes, how can I obtain t
Hi CVP,
I'm not so much familiar with the internals of the checkpointing system,
but maybe Stephan (in CC) has an idea what's going on here.
Best, Fabian
2016-09-23 11:33 GMT+02:00 Chakravarthy varaga :
> Hi Aljoscha & Fabian,
>
> I have a stream application that has 2 stream source as belo
Hi Markus,
thanks for the stacktraces!
The client is indeed stuck in the optimizer. I have to look a bit more into
this.
Did you try to set JoinHints in your plan? That should reduce the plan
space that is enumerated and therefore reduce the optimization time (maybe
enough to run your application
Hi Buvana,
A TaskManager runs as a single JVM process. A TaskManager provides a
certain number of processing slots. Slots do not guard CPU time, IO, or JVM
memory. At the moment they only isolate managed memory which is only used
for batch processing. For streaming applications their only purpose
Hi Anchit,
Flink does not yet have a streaming sink connector for HBase. Some members
of the community are working on this though [1].
I think we resolved a similar issue for the Kafka connector recently [2].
Maybe the related commits contain some relevant code for your problem.
Best, Fabian
[1]
Hi Ken,
you can certainly have partitioned sources and sinks. You can control the
parallelism by calling .setParallelism() method.
If you need a partitioned sink, you can call .keyBy() to hash partition.
I did not completely understand the requirements of your program. Can you
maybe provide pseud
Great, thanks!
I gave you contributor permissions in JIRA. You can now also assign issues
to yourself if you decide to continue to contribute.
Best, Fabian
2016-09-29 16:48 GMT+02:00 jaxbihani :
> Hi Fabian
>
> My JIRA user is: jaxbihani
> I have created a pull request for the fix :
> https://gi
Hi Neil,
"B" only refers to the key-part of the record, the number is the timestamp
(as you assumed out). The payload of the record is not displayed in the
figure. So B35 and B31 are two different records with identical key.
The keyBy() operation sends all records with the same key to the same
sub
n fact the key of
> the stream and not the id of the event? Probably seems trivial, but I
> struggled with this one. haha. I’ll submit a PR for the docs if there’s
> interest.
>
> Neil
>
> On Sep 29, 2016, at 11:36 AM, Fabian Hueske wrote:
>
> Hi Neil,
>
> "B&
Hi Simone,
I think I have a solution for your problem:
val s: DataStream[(Long, Int, ts)] = ??? // (id, state, time)
val stateChanges: DataStream[(Int, Int)] = s // (state, cntUpdate)
.keyBy(_._1) // key by id
.flatMap(new StateUpdater) // StateUpdater is a stateful FlatMapFunction.
It has a
, this is slightly different from what I need.
>
> 2016-09-30 10:04 GMT+02:00 Fabian Hueske :
>
>> Hi Simone,
>>
>> I think I have a solution for your problem:
>>
>> val s: DataStream[(Long, Int, ts)] = ??? // (id, state, time)
>>
>> val st
FYI: FLINK-4497 [1] requests Scala tuple and case class support for the
Cassandra sink and was opened about a month ago.
[1] https://issues.apache.org/jira/browse/FLINK-4497
2016-09-30 23:14 GMT+02:00 Stephan Ewen :
> How hard would it be to add case class support?
>
> Internally, tuples and cas
Hi Philipp,
If I got your requirements right you would like to:
1) load an initial hashmap via JDBC
2) update the hashmap from a stream
3) use the hashmap to enrich another stream.
You can use a CoFlatMap to do this:
stream1.connect(stream2).flatMap(new YourCoFlatMapFunction).
YourCoFlatMapFunc
Thanks Hironori for sharing these excellent news!
Do you think it would be possible to add your use case to Flink's
Powered-By wiki page [1] ?
Thanks, Fabian
[1] https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
2016-10-04 14:04 GMT+02:00 Hironori Ogibayashi :
> Hello,
>
> Just
Hi Yassine,
AFAIK, there is no built-in way to ignore corrupted compressed files.
You could try to implement a FileInputFormat that wraps the CsvInputFormat
and forwards all calls to the wrapped CsvIF.
The wrapper would also catch and ignore the EOFException.
If you do that, you would not be able
Hi,
the TextSocketSink is rather meant for demo purposes than to be used in an
actual applications.
I am not aware of any other built-in source that would provide what you are
looking for.
You can implement a custom SourceFunction that does what you need.
Best, Fabian
2016-10-05 9:48 GMT+02:00 A
PM, Hironori Ogibayashi <
> ogibaya...@gmail.com>
> > wrote:
> >>
> >> Thank you for the response.
> >> Regarding adding to the page, I will check with our PR department.
> >>
> >> Regards,
> >> Hironori
> >>
> >&g
Hi Alberto,
if you want to read a single column you have to wrap it in a Tuple1:
val text4 = env.readCsvFile[Tuple1[String]]("file:data.csv"
,includedFields = Array(1))
Best, Fabian
2016-10-06 20:59 GMT+02:00 Alberto Ramón :
> I'm learning readCsvFile
> (I discover if the file ends on "/n", yo
Hi,
how are you executing your code? From an IDE or on a running Flink instance?
If you execute it on a running Flink instance, you have to look into the
.out files of the task managers (located in ./log/).
Best, Fabian
2016-10-06 22:08 GMT+02:00 drystan mazur :
> Hello I am reading a csv file
Hi Greg,
print is only eagerly executed for DataSet programs.
In the DataStream API, print() just appends a print sink and execute() is
required to trigger an execution.
2016-10-06 22:40 GMT+02:00 Greg Hogan :
> The program executes when you call print (same for collect), which is why
> you are
Maybe this can be done by assigning the same window id to each of the N
local windows, and do a
.keyBy(windowId)
.countWindow(N)
This should create a new global window for each window id and collect all N
windows.
Best, Fabian
2016-10-06 22:39 GMT+02:00 AJ Heller :
> The goal is:
> * to split
in
> the mapper that increments on every map call). It works, but by any chance
> is there a more succinct way to do it?
>
> On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske wrote:
>
>> Maybe this can be done by assigning the same window id to each of the N
>> local windows, an
As the exception says the class
org.apache.flink.api.scala.io.jdbc.JDBCInputFormat does not exist.
You have to do:
import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
There is no Scala implementation of this class but you can also use Java
classes in Scala.
2016-10-07 21:38 GMT+02:00 Alber
> Your solution compile with out errors, but IncludedFields Isn't working:
> [image: Imágenes integradas 1]
>
> The output is incorrect:
> [image: Imágenes integradas 2]
>
> The correct result must be only 1º Column
> (a,1)
> (aa,1)
>
> 2016-10-06 21:37 GMT+02:
401 - 500 of 1193 matches
Mail list logo