date:20170627

New message processing time after recovery.

2017-06-27 Thread yunfan123

For example, my job failed in timestamp 1. 
Recovery from checkpoint takes 600 seconds. 
So the new elements' processing time into my streams is 601?



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/New-message-processing-time-after-recovery-tp14032.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: How to perform multiple stream join functionality

2017-06-27 Thread yunfan123

Flink 1.3? I'm use flink 1.3, how can I do to implement this?



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-perform-multiple-stream-join-functionality-tp7184p14031.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: Window data retention - Appending to previous data and processing

2017-06-27 Thread G.S.Vijay Raajaa

Thanks a lot. It works fine !!

Regards,
Vijay Raajaa GS

On Mon, Jun 26, 2017 at 7:01 PM, Aljoscha Krettek 
wrote:

> Hi,
>
> I think you should be able to do this by using:
>
>  * GlobalWindows as your window function
>  * a custom Trigger that fires on every element, sets a timer for your
> cleanup time, and purges when the cleanup timer fires
>  * a ProcessWindowFunction, to so that you always get all the contents of
> the window when processing a window
>
> Best,
> Aljoscha
>
> > On 24. Jun 2017, at 18:37, G.S.Vijay Raajaa 
> wrote:
> >
> > Hi ,
> >
> > I am trying to implement a flink job which requires a window that keeps
> on adding data to the previous data in the window. The idea is for every
> addition of a new stream of record, the subsequent chain till the sink
> needs to be called. In the next iteration window will have old data + new
> data and the chain is still processed.
> >
> > Iteration 1 - Window ( record_1) -> Proceed with downstream Chaining and
> call sink
> > Iteration 2 - Window(record_1,record_2) -> Proceed with downstream
> Chaining and call sink
> > Iteration n - Window(record_1,record_2,...,record n) -> Proceed with
> downstream Chaining and call sink
> >
> > Finally , clear the window at the configured time of the day.
> >
> > I hope the use case is clear. Looking forward to your thoughts on
> designing the same.
> >
> > Regards,
> > Vijay Raajaa G S
>
>

Re: Problem with Summerization

2017-06-27 Thread Greg Hogan

Hi Ali,

Could you print and include a gellyGraph which results in this error.

Greg


> On Jun 27, 2017, at 2:48 PM, rost...@informatik.uni-leipzig.de wrote:
> 
> Dear All,
> 
> I do not understand what the error in the following code can be?
> 
> Graph gellyGraph = ...
> 
> Graph Summarization.EdgeValue> g =
>gellyGraph.run(new Summarization NullValue>());
> 
> g.getVertices().print(); //this one works fine
> g.getEdges().print();//this one gives the following error
> 
> ...
> org.apache.flink.types.NullFieldException: Field 2 is null, but expected to 
> hold a value.
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:126)
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30)
>   at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:83)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:85)
>   at 
> org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
>   at 
> org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
>   at 
> org.apache.flink.graph.library.Summarization$VertexGroupReducer.reduce(Summarization.java:318)
>   at 
> org.apache.flink.runtime.operators.GroupReduceDriver.run(GroupReduceDriver.java:131)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:748)
> 
> I also tried with different types for EdgeValue. The same error appears.
> When I look at the result vertices, everything looks fine.
> But the result edges has this problem.
> 
> Any idea?
> 
> Regards,
> Ali

Problem with Summerization

2017-06-27 Thread rostami


Dear All,

I do not understand what the error in the following code can be?

Graph gellyGraph = ...

Graph g =
gellyGraph.run(new  
Summarization());


g.getVertices().print(); //this one works fine
g.getEdges().print();//this one gives the following error

...
org.apache.flink.types.NullFieldException: Field 2 is null, but  
expected to hold a value.
	at  
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:126)
	at  
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30)
	at  
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
	at  
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:83)
	at  
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:85)
	at  
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
	at  
org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
	at  
org.apache.flink.graph.library.Summarization$VertexGroupReducer.reduce(Summarization.java:318)
	at  
org.apache.flink.runtime.operators.GroupReduceDriver.run(GroupReduceDriver.java:131)

at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:748)

I also tried with different types for EdgeValue. The same error appears.
When I look at the result vertices, everything looks fine.
But the result edges has this problem.

Any idea?

Regards,
Ali

Re: MapR libraries shading issue

2017-06-27 Thread ani.desh1512

Again as I mentioned in the MapR thread,

So, after some more digging, I found out that you can make flink use the
default java truststore by passing
-Djavax.net.ssl.trustStore=$JAVA_HOME/jre/lib/security/cacerts as JVM_ARGS
for Flink.
I tested this approach with AWS, datadog along with MapR Streams and Tables
and it seems to have worked as of now.

I am not sure if this is the right approach, but if it indeed is then we
should include it in the Flink Mapr documentation.



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/MapR-libraries-shading-issue-tp13988p14027.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Incremental aggregation using Fold and failure recovery

2017-06-27 Thread Ahmad Hassan

Hi All,

I am collecting millions of events per hour for 'N' number of products
where 'N' can be 50k. I use the following fold mechanism with sliding
window:

final DataStream eventStream = inputStream
.keyBy(TENANT, CATEGORY)
.window(SlidingProcessingTimeWindows.of(Time.hour(1,Time.minute(5)))
*.fold(new WindowStats(),* newProductAggregationMapper(),
newProductAggregationWindowFunction());

In WindowStats class, I keep a map of HashMap>. So for 50k products I will have 50k entries
in the map within WindowStats class.

My question is, if I set (env.enableCheckpointing(1000)), then the WindowStats
instance for each existing window will automatically be checkpointed and
restored on recovery? If not then how can I better a implement above
usecase to store product metric using fold operation please?

Thanks for all the help.

Best Regards,

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-27 Thread sohimankotia

So In following execution flow :

 source -> map -> partitioner -> flatmap -> sink 

I am attaching current time to tuple while emitting from map function , and
then extracting that timestamp value from tuple in flatmap at a very first
step . Then I am calculating difference between time attached while emitting
from map and entering into flatmap .



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Partitioner-is-spending-around-2-to-4-minutes-while-pushing-data-to-next-operator-tp13913p14025.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: java.lang.IllegalArgumentException with 1.3.0 and Elastic Search connector

2017-06-27 Thread Aljoscha Krettek

Hi Victor,

What are you using as a Source? The stack trace you posted indicates that the 
problem is happening while specifying the source. This might be caused by some 
interactions with the Elasticsearch dependency.

Best,
Aljoscha

> On 17. Jun 2017, at 18:36, Victor Godoy Poluceno  
> wrote:
> 
> Hi,
> 
> after migrating to Flink 1.3.0, I started to get this exception when running 
> in local mode:
> 
> [error] (run-main-0) java.lang.IllegalArgumentException
> java.lang.IllegalArgumentException
> at org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown 
> Source)
> at org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown 
> Source)
> at 
> org.apache.flink.api.scala.InnerClosureFinder.(ClosureCleaner.scala:279)
> at 
> org.apache.flink.api.scala.ClosureCleaner$.getInnerClasses(ClosureCleaner.scala:95)
> at 
> org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:115)
> at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:670)
> at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.scala:600)
> at org.azion.com.Job$.main(Job.scala:39)
> at org.azion.com.Job.main(Job.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> 
> This exception only happens when using the Flink Elastic Search connector, 
> everything works fine ES connector is removed from the dependencies.
> 
> I am using, Scala 2.11.11, Elastic Search Connector 2 and Java OpenJDK 
> version "1.8.0_111".
> 
> Any ideas about the problem here?
> 
> By the way, this seems very close to this: [1], [2]
> 
> [1] - 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/asm-IllegalArgumentException-with-1-0-0-td5411.html
>  
> 
> [2] - https://issues.apache.org/jira/browse/FLINK-3328 
> 
> 
> -- 
> hooray!
> 
> --
> Victor Godoy Poluceno

Partition index from partitionCustom vs getIndexOfThisSubtask downstream

2017-06-27 Thread Urs Schoenenberger

Hi,

if I use DataStream::partitionCustom, will the partition number that my
custom Partitioner returns always be equal to getIndexOfThisSubtask
in the following operator?

A test case with different parallelisms seems to suggest this is true,
but the Javadoc seems ambiguous to me since the Partitioner doc talks
about the "partition index" while the RuntimeContext doc talks about the
"index of the parallel subtask".

Thanks,
Urs

-- 
Urs Schönenberger - urs.schoenenber...@tngtech.com

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Re: How to perform multiple stream join functionality

2017-06-27 Thread Aljoscha Krettek

Hi,

I’m afraid there is also no simple, built-in feature for doing this in Flink 
1.3.

Best,
Aljoscha

> On 27. Jun 2017, at 10:37, yunfan123  wrote:
> 
> In flink release 1.3, can I do this in simple way?
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-perform-multiple-stream-join-functionality-tp7184p14011.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.

Re: Combining streams with static data and using REST API as a sink

2017-06-27 Thread Aljoscha Krettek

A quick note on this: the side-input API is still ongoing work and it turns out 
it’s more complicated (obviously … ) and we will need quite a bit more work on 
other parts of Flink before we can provide a good built-in solution.

In the meantime, you can check out the Async I/O operator [1]. I think this 
fits your use case of accessing an external system quite well because it allows 
firing off several request to external systems at the same time.

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html
> On 21. Jun 2017, at 18:44, Nancy Estrada  wrote:
> 
> Hi Josh,
> 
> I have a use-case similar to yours. I need to join a stream with data from a
> database to which I have access via a REST API. Since the Side inputs API
> continues begin and ongoing work. I am wondering how did you approached it,
> Did you use the rich function updating it periodically?
> 
> Thank you in advance!
> Nancy
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Combining-streams-with-static-data-and-using-REST-API-as-a-sink-tp7083p13902.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.

Re: Session ID Error

2017-06-27 Thread Aljoscha Krettek

Hi Will,

How did you configure Flink and what is the command that you’re using to submit 
your job/session?

Best,
Aljoscha

> On 21. Jun 2017, at 01:44, Will Walters  wrote:
> 
> Hello,
> 
> In attempting to submit a job via Yarn session on Hadoop cluster (using Flink 
> 1.2.1), I get timed out and receive the following error from the server:
> 
> Discard message 
> LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
>  5d0406fc547af4fc36bf78ed812a3f90),EXECUTION_RESULT_AND_STATE_CHANGES)) 
> because the expected leader session ID None did not equal the received leader 
> session ID Some(----).
> This seems to be an issue with the way that a Null session ID is handled. Is 
> there a way to manually set the session ID? Any advice would be greatly 
> appreciated.
> 
> Thank you,
> Will Walters.

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

2017-06-27 Thread Aljoscha Krettek

Hi,

That depends, how are you measuring and what are your results?

Best,
Aljoscha 
> On 19. Jun 2017, at 06:23, sohimankotia  wrote:
> 
> Thanks for pointers Aljoscha. 
> 
> I was just wondering, Since Custom partition will run in separate thread .
> Is it possible that from map -> custom partition -> flat map can take more
> than 200 seconds if parallelism is still 1 .
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Using-Custom-Partitioner-in-Streaming-with-parallelism-1-adding-latency-tp13766p13822.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-27 Thread Aljoscha Krettek

Hi,

What do you mean by latency and how are you measuring this in your job?

Best,
Aljoscha

> On 22. Jun 2017, at 14:23, sohimankotia  wrote:
> 
> Hi Chesnay,
> 
> I have data categorized on some attribute(Key in partition ) which will be
> having n possible values.  As of now job is enabled for only one value of
> that attribute . In couple of days we will enable all values of attribute
> with more parallelism so each attribute's type data get processed in single
> instance .
> 
> So, while running with parallelism 1 I just observed the 2 to 4 minutes
> latency from map -> p -> flatmap
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Partitioner-is-spending-around-2-to-4-minutes-while-pushing-data-to-next-operator-tp13913p13916.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.

Re: Performance Improvement on Flink 1.2.0

2017-06-27 Thread Aljoscha Krettek

Just a quick remark about memory and number of slots: with your configuration 
of 30 slots but only ~20gb of RAM each processing slot does not have a lot of 
memory to work with. For batch programs this can be a problem. I would suggest 
to use less but bigger slots, even if the number of cores is very high on your 
machine.

Best,
Aljoscha
> On 22. Jun 2017, at 17:36, Greg Hogan  wrote:
> 
> Some documentation on application profiling with Flink 1.3 (can be manually 
> inserted into the scripts for Flink 1.2):
>   
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/application_profiling.html
>  
> 
> 
> 
>> On Jun 22, 2017, at 9:24 AM, Stefan Richter > > wrote:
>> 
>> Hi,
>> 
>> the answer highly depends on what you job is doing and there is no 
>> information about that. Also what is your target in performance? Are you 
>> using batch or streaming? If you feel like the performance is lower than 
>> expected, I suggest that you do some profiling to figure out the hotspots.
>> For example, you could see that your job spends most time in type 
>> serialization, which is a common bottleneck. In this case, maybe you can 
>> write a faster custom serializer. Or rewriting the job (e.g. use early 
>> aggregation where possible etc.) can yield much more performance improvement 
>> then tuning magic numbers with no further knowledge about your job.
>> 
>> Best,
>> Stefan 
>> 
>>> Am 22.06.2017 um 12:08 schrieb Samim Ahmed >> >:
>>> 
>>> Hi All,
>>> 
>>> This query regarding the flink performance improvement .
>>> 
>>> Flink Configuration:
>>> using flink in clustor mode with 3 salves and a master configuration
>>> slots used 30 (as the system has 30 core)
>>> task manager memory 30GB
>>> parallelism used : 30
>>> jobmanager.heap.mb: 20480
>>> taskmanager.heap.mb: 20480
>>> taskmanager.numberOfTaskSlots: 30
>>> taskmanager.network.numberOfBuffers: 2
>>> 
>>> Input info:
>>> Input file : 1ROP(5min) data with  Nodes and 665K eps
>>> Total number of events :: 199498294 
>>> 
>>> Observation :
>>> Total time taken to complete the task = 6m24s
>>> 
>>> Can you please suggest what else I need to modify to get the high 
>>> performance in terms of lese execution time. Thanks in advance
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Samim Ahmed 
>>> Mumbai
>>> 09004259232

Re: Different Window Sizes in keyed stream

2017-06-27 Thread Aljoscha Krettek

Hi Ahman,

You could in fact do this by writing a custom WindowAssigner. Have a look at 
the assignWindows() method here: 
https://github.com/apache/flink/blob/12b4185c6c09101b64e12a84c33dc4d28f95cff9/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/assigners/SlidingProcessingTimeWindows.java#L45-L45
 
.
 Instead of assigning always the same windows you can type your WindowAssigner 
on the actual type of your stream and inspect the event when assigning windows.

Best,
Aljoscha

> On 23. Jun 2017, at 10:32, Ahmad Hassan  wrote:
> 
> Thanks Fabian for the advice!
> 
> Best Regards,
> 
> Dr. Ahmad Hassan
> 
> On 23 June 2017 at 09:05, Fabian Hueske  > wrote:
> Hi Ahmad,
> 
> that is not possible, at least not with Flink's built-in windows. 
> You can probably implement something like that on top of the DataStream API 
> but I think it would quite a bit of effort.
> 
> IMO, the better approach would be to start a separate Flink job per tenant. 
> This would also improve the isolation and failure behavior.
> 
> Best, Fabian
> 
> 2017-06-22 19:43 GMT+02:00 Ahmad Hassan  >:
> Hi All,
> 
> I want to know if flink allows to define sliding window size and slide time 
> on the fly. For example I want to configure sliding window of size 2 min and 
> slide 1 min for tenant A but size 10 min and slide min for tenant B in a 
> keyed stream and so on for other tenants. My code is below.
> 
> final DataStream eventStream = inputStream
> .keyBy(TENANT, CATEGORY)
> .window(SlidingProcessingTimeWindows.of(Time.minutes(2,Time.minute(1)))
> .fold(new WindowStats(), newProductAggregationMapper(), 
> newProductAggregationWindowFunction());
> 
> Can I do that for unlimited number of tenants in flink ?
> 
> Cheers,
> 
> Dr. Ahmad Hassan
> 
>

Re: Default value - Time window expires with no data from source

2017-06-27 Thread Aljoscha Krettek

You mean you want to output some data when you know that you don’t have any 
counts for a given time window?

This is not (easily) possible in Flink right now because this would require an 
operation with parallelism one that determines that there is no data across all 
keys.

Best,
Aljoscha

> On 24. Jun 2017, at 18:22, G.S.Vijay Raajaa  wrote:
> 
> Hi,
> 
> I am trying to implement a flink job which takes the twitter as the source 
> and collects tweets from a list of hashtags. The flink job basically 
> aggregates the volume of tweets per hashtag in a given time frame. I have 
> implemented them successfully, but then if there is no tweet across all the 
> hashtags I need to send out a default value of 0 across all hashtags. Not 
> sure how to implement this functionality.
> 
> Code Snippet :
> 
> env.addSource(source)
> 
> .flatMap(new ExtractHashTagsSymbols(tickers))
> 
> .keyBy(0)
> 
> .timeWindow(Time.seconds(Integer.parseInt(window_time)))
> 
> .sum(1)
> 
> .timeWindowAll(Time.seconds(Integer.parseInt(window_time)))
> 
> .apply(new GetVolume(tickerVolumeMap))
> 
> .addSink(new SinkFunction(){
> 
>   
> public void invoke(JSONObject value) throws Exception {
> 
> System.out.println("Twitter Volume:"+value.toString());
> 
> //JsonParser jsonParser = new JsonParser();
> 
> //JsonObject gsonObject = 
> (JsonObject)jsonParser.parse(value.toString());
> 
> pushToSocket(value, socket_url);
> 
> }
> 
> });
> 
> 
> 
> The above code waits for window_time time frame and computes the tweet volume 
> and sends out a json. 
> 
> Regards,
> 
> Vijay Raajaa GS 
>

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-27 Thread Aljoscha Krettek

Hi,

Hadoop FileInputFormats (by default) also include hidden files (files starting 
with “.” or “_”). You can override this behaviour in Flink by subclassing 
TextInputFormat and overriding the accept() method. You can use a custom input 
format with ExecutionEnvironment.readFile().

Regarding BucketingSink, you can change both the prefixes and suffixes of the 
various files using configuration methods.

Best,
Aljoscha

> On 27. Jun 2017, at 11:53, Adarsh Jain  wrote:
> 
> Thanks Stefan, my colleague Shashank has filed a bug for the same in jira
> 
> https://issues.apache.org/jira/browse/FLINK-6993 
> 
> 
> Regards,
> Adarsh
> 
> On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter  > wrote:
> Hi,
> 
> I suggest that you simply open an issue for this in our jira, describing the 
> improvement idea. That should be the fastest way to get this changed.
> 
> Best,
> Stefan
> 
>> Am 23.06.2017 um 15:08 schrieb Adarsh Jain > >:
>> 
>> Hi Stefan,
>> 
>> I think I found the problem, try it with a file which starts with underscore 
>> in the name like "_part-1-0.csv".
>> 
>> While saving Flink appends a "_" to the file name however while reading at 
>> folder level it does not pick those files.
>> 
>> Can you suggest if we can do a setting so that it does not pre appends 
>> underscore while saving a file.
>> 
>> Regards,
>> Adarsh
>> 
>> On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter > > wrote:
>> No, that doesn’t make a difference and also works.
>> 
>>> Am 23.06.2017 um 11:40 schrieb Adarsh Jain >> >:
>>> 
>>> I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can 
>>> this be the problem?
>>> 
>>> With "import org.apache.flink.api.scala.ExecutionEnvironment"
>>> 
>>> Using scala in my program.
>>> 
>>> Regards,
>>> Adarsh 
>>> 
>>> On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter 
>>> > wrote:
>>> I just copy pasted your code, adding the missing "val env = 
>>> LocalEnvironment.createLocalEnvironment()" and exchanged the string with a 
>>> local directory for some test files that I created. No other changes.
>>> 
 Am 23.06.2017 um 11:25 schrieb Adarsh Jain >:
 
 Hi Stefan,
 
 Thanks for your efforts in checking the same, still doesn't work for me. 
 
 Can you copy paste the code you used maybe I am doing some silly mistake 
 and am not able to figure out the same.
 
 Thanks again.
 
 Regards,
 Adarsh
 
 
 On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter 
 > wrote:
 Hi,
 
 I tried this out on the current master and the 1.3 release and both work 
 for me everything works exactly as expected, for file names, a directory, 
 and even nested directories.
 
 Best,
 Stefan
 
> Am 22.06.2017 um 21:13 schrieb Adarsh Jain  >:
> 
> Hi Stefan,
> 
> Yes your understood right, when I give full path till the filename it 
> works fine however when I give path till 
> directory it does not read the data, doesn't print any exceptions too ... 
> I am also not sure why it is behaving like this.
> 
> Should be easily replicable, in case you can try. Will be really helpful.
> 
> Regards,
> Adarsh
> 
> On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter 
> > wrote:
> Hi,
> 
> I am not sure I am getting the problem right: the code works if you use a 
> file name, but it does not work for directories? What exactly is not 
> working? Do you get any exceptions?
> 
> Best,
> Stefan
> 
>> Am 22.06.2017 um 17:01 schrieb Adarsh Jain > >:
>> 
>> Hi,
>> 
>> I am trying to use "Recursive Traversal of the Input Path Directory" in 
>> Flink 1.3 using scala. Snippet of my code below. If I give exact file 
>> name it is working fine. Ref 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html
>>  
>> 
>> 
>> import org.apache.flink.api.java.utils.ParameterTool
>> import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
>> import org.apache.flink.configuration.Configuration
>> 
>> val config = new Configuration
>> config.setBoolean("recursive.file.enumeration",true)

Re: Checkpointing with RocksDB as statebackend

2017-06-27 Thread vinay patil

Hi Stephan,

I am observing similar issue with Flink 1.2.1

The memory is continuously increasing and data is not getting flushed to
disk.

I have attached the snapshot for reference.

Also the data processed till now is only 17GB and above 120GB memory is
getting used.

Is there any change wrt RocksDB configurations


 

Regards,
Vinay Patil



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p14013.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-27 Thread Adarsh Jain

Thanks Stefan, my colleague Shashank has filed a bug for the same in jira

https://issues.apache.org/jira/browse/FLINK-6993

Regards,
Adarsh

On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter  wrote:

> Hi,
>
> I suggest that you simply open an issue for this in our jira, describing
> the improvement idea. That should be the fastest way to get this changed.
>
> Best,
> Stefan
>
> Am 23.06.2017 um 15:08 schrieb Adarsh Jain :
>
> Hi Stefan,
>
> I think I found the problem, try it with a file which starts with
> underscore in the name like "_part-1-0.csv".
>
> While saving Flink appends a "_" to the file name however while reading at
> folder level it does not pick those files.
>
> Can you suggest if we can do a setting so that it does not pre appends
> underscore while saving a file.
>
> Regards,
> Adarsh
>
> On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <
> s.rich...@data-artisans.com> wrote:
>
>> No, that doesn’t make a difference and also works.
>>
>> Am 23.06.2017 um 11:40 schrieb Adarsh Jain :
>>
>> I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can
>> this be the problem?
>>
>> With "import org.apache.flink.api.scala.ExecutionEnvironment"
>>
>> Using scala in my program.
>>
>> Regards,
>> Adarsh
>>
>> On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <
>> s.rich...@data-artisans.com> wrote:
>>
>>> I just copy pasted your code, adding the missing "val env
>>> = LocalEnvironment.createLocalEnvironment()" and exchanged the string
>>> with a local directory for some test files that I created. No other changes.
>>>
>>> Am 23.06.2017 um 11:25 schrieb Adarsh Jain :
>>>
>>> Hi Stefan,
>>>
>>> Thanks for your efforts in checking the same, still doesn't work for me.
>>>
>>> Can you copy paste the code you used maybe I am doing some silly mistake
>>> and am not able to figure out the same.
>>>
>>> Thanks again.
>>>
>>> Regards,
>>> Adarsh
>>>
>>>
>>> On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <
>>> s.rich...@data-artisans.com> wrote:
>>>
 Hi,

 I tried this out on the current master and the 1.3 release and both
 work for me everything works exactly as expected, for file names, a
 directory, and even nested directories.

 Best,
 Stefan

 Am 22.06.2017 um 21:13 schrieb Adarsh Jain :

 Hi Stefan,

 Yes your understood right, when I give full path till the filename it
 works fine however when I give path till
 directory it does not read the data, doesn't print any exceptions too
 ... I am also not sure why it is behaving like this.

 Should be easily replicable, in case you can try. Will be really
 helpful.

 Regards,
 Adarsh

 On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <
 s.rich...@data-artisans.com> wrote:

> Hi,
>
> I am not sure I am getting the problem right: the code works if you
> use a file name, but it does not work for directories? What exactly is not
> working? Do you get any exceptions?
>
> Best,
> Stefan
>
> Am 22.06.2017 um 17:01 schrieb Adarsh Jain :
>
> Hi,
>
> I am trying to use "Recursive Traversal of the Input Path Directory"
> in Flink 1.3 using scala. Snippet of my code below. If I give exact file
> name it is working fine. Ref https://ci.apache.org/proj
> ects/flink/flink-docs-release-1.3/dev/batch/index.html
>
> import org.apache.flink.api.java.utils.ParameterTool
> import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
> import org.apache.flink.configuration.Configuration
>
> val config = new Configuration
> config.setBoolean("recursive.file.enumeration",true)
>
> val featuresSource: String = "file:///Users/adarsh/Document
> s/testData/featurecsv/31c710ac40/2017/06/22"
>
> val testInput = env.readTextFile(featuresSourc
> e).withParameters(config)
> testInput.print()
>
> Please guide how to fix this.
>
> Regards,
> Adarsh
>
>
>


>>>
>>>
>>
>>
>
>

Re: How to perform multiple stream join functionality

2017-06-27 Thread yunfan123

In flink release 1.3, can I do this in simple way?



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-perform-multiple-stream-join-functionality-tp7184p14011.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

New message processing time after recovery.

Re: How to perform multiple stream join functionality

Re: Window data retention - Appending to previous data and processing

Re: Problem with Summerization

Problem with Summerization

Re: MapR libraries shading issue

Incremental aggregation using Fold and failure recovery

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

Re: java.lang.IllegalArgumentException with 1.3.0 and Elastic Search connector

Partition index from partitionCustom vs getIndexOfThisSubtask downstream

Re: How to perform multiple stream join functionality

Re: Combining streams with static data and using REST API as a sink

Re: Session ID Error

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

Re: Performance Improvement on Flink 1.2.0

Re: Different Window Sizes in keyed stream

Re: Default value - Time window expires with no data from source

Re: Recursive Traversal of the Input Path Directory, Not working

Re: Checkpointing with RocksDB as statebackend

Re: Recursive Traversal of the Input Path Directory, Not working

Re: How to perform multiple stream join functionality

22 matches

Site Navigation

Mail list logo

Footer information