rocksdb without checkpointing

2017-02-15 Thread Abhishek R. Singh
Is it possible to set state backend as RocksDB without asking it to checkpoint? We are trying to do application level checkpointing (since it gives us better flexibility to upgrade our flink pipeline and also restore state in a application specific upgrade friendly way). So we don’t really need

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.initializeForJob(RocksDBStateBackend.java:304) ... 6 more > On Jan 23, 2017, at 8:20 AM, Abhishek R. Singh > <abhis...@tetrationanalytics.com> wrote: > > Is there a limit on how many DataStreams can be defined in a st

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
combined = combined.union(streams.get(i)); } combined.print().setParallelism(1); } else { // die parallel for (int i = 1; i < nParts; i++) { streams.get(i).print(); } } > On Jan 23, 2017, at 6:14 AM, Abhishek R. Singh > <abhis...@tetrationanalytics.com> wrote: >

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
I even make it 10 minutes: akka.client.timeout: 600s But doesn’t feel like it is taking effect. It still comes out at about the same time with the same error. -Abhishek- > On Jan 23, 2017, at 6:04 AM, Abhishek R. Singh > <abhis...@tetrationanalytics.com> wrote: > > ye

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
yes, I had increased it to 5 minutes. It just sits there and bails out again. > On Jan 23, 2017, at 1:47 AM, Jonas wrote: > > The exception says that > > Did you already try that? > > > > -- > View this message in context: >

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
I am using version 1.1.4 (latest stable) > On Jan 23, 2017, at 12:41 AM, Abhishek R. Singh > <abhis...@tetrationanalytics.com> wrote: > > I am trying to construct a topology like this (shown for parallelism of 4) - > basically n parallel windowed processing sub-pipelin

weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
I am trying to construct a topology like this (shown for parallelism of 4) - basically n parallel windowed processing sub-pipelines with single source and single sink: I am getting the following failure (if I go beyond 28 - found empirically using binary search). There is nothing in the job

Re: checkpoint notifier not found?

2016-12-14 Thread Abhishek R. Singh
tory. > > On Wed, Dec 14, 2016 at 1:20 AM, Abhishek R. Singh > <abhis...@tetrationanalytics.com <mailto:abhis...@tetrationanalytics.com>> > wrote: > Not sure how to go from here. How do I create a PR for this? > > $ git branch > * doc-checkpoint-notify

Re: checkpoint notifier not found?

2016-12-13 Thread Abhishek R. Singh
see the `org.apache.flink.runtime.state.CheckpointListener` interface. ## State Checkpoints in Iterative Jobs > On Dec 12, 2016, at 3:11 PM, Abhishek R. Singh > <abhis...@tetrationanalytics.com> wrote: > > https://issues.apache.org/jira/browse/FLINK-5323 > <https:

Re: checkpoint notifier not found?

2016-12-12 Thread Abhishek R. Singh
ack. Love the project > !! > > Thanks for the awesomeness. > > > On Mon, Dec 12, 2016 at 12:29 PM Stephan Ewen <se...@apache.org > <mailto:se...@apache.org>> wrote: > Thanks for reporting this. > It would be awesome if you could file a JIRA or a pull req

Re: checkpoint notifier not found?

2016-12-09 Thread Abhishek R. Singh
import org.apache.flink.runtime.state.CheckpointListener; -Abhishek- > On Dec 9, 2016, at 4:30 PM, Abhishek R. Singh > <abhis...@tetrationanalytics.com> wrote: > > I can’t seem to find CheckpointNotifier. Appreciate help ! > > CheckpointNotifier is not a member of package > org.apac

checkpoint notifier not found?

2016-12-09 Thread Abhishek R. Singh
I can’t seem to find CheckpointNotifier. Appreciate help ! CheckpointNotifier is not a member of package org.apache.flink.streaming.api.checkpoint From my pom.xml: org.apache.flink flink-scala_2.11 1.1.3

Re: StructuredStreaming status

2016-10-19 Thread Abhishek R. Singh
Its not so much about latency actually. The bigger rub for me is that the state has to be reshuffled every micro/mini-batch (unless I am not understanding it right - spark 2.0 state model i.e.). Operator model avoids it by preserving state locality. Event time processing and state purging are

flink async snapshots

2016-05-19 Thread Abhishek R. Singh
g something? > >> On 19 May 2016, at 20:36, Abhishek R. Singh <abhis...@tetrationanalytics.com >> <mailto:abhis...@tetrationanalytics.com>> wrote: >> >> I was wondering how checkpoints can be async? Because your state is >> constantly mutating. You

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Abhishek R. Singh
I was wondering how checkpoints can be async? Because your state is constantly mutating. You probably need versioned state, or immutable data structs? -Abhishek- > On May 19, 2016, at 11:14 AM, Paris Carbone wrote: > > Hi Stavros, > > Currently, rollback failure recovery in

custom sources

2016-05-17 Thread Abhishek R. Singh
Hi, Can we define custom sources in link? Control the barriers and (thus) checkpoints at good watermark points? -Abhishek-

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Abhishek R. Singh
You had: RDD.reduceByKey((x,y) = x+y) RDD.take(3) Maybe try: rdd2 = RDD.reduceByKey((x,y) = x+y) rdd2.take(3) -Abhishek- On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com wrote: HI All, I have data in RDD as mentioned below: RDD : Array[(Int),(Int)] =

Re: Writing to multiple outputs in Spark

2015-08-14 Thread Abhishek R. Singh
A workaround would be to have multiple passes on the RDD and each pass write its own output? Or in a foreachPartition do it in a single pass (open up multiple files per partition to write out)? -Abhishek- On Aug 14, 2015, at 7:56 AM, Silas Davis si...@silasdavis.net wrote: Would it be right

Re: tachyon

2015-08-07 Thread Abhishek R. Singh
you would get better response on Tachyon's mailing list: https://groups.google.com/forum/?fromgroups#!forum/tachyon-users Cheers On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh abhis...@tetrationanalytics.com wrote: Do people use Tachyon in production, or is it experimental grade

tachyon

2015-08-07 Thread Abhishek R. Singh
Do people use Tachyon in production, or is it experimental grade still? Regards, Abhishek - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Abhishek R. Singh
I don't know if (your assertion/expectation that) workers will process things (multiple partitions) in parallel is really valid. Or if having more partitions than workers will necessarily help (unless you are memory bound - so partitions is essentially helping your work size rather than

spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
Is it fair to say that Storm stream processing is completely in memory, whereas spark streaming would take a disk hit because of how shuffle works? Does spark streaming try to avoid disk usage out of the box? -Abhishek- - To

Re: spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
comparison for end-to-end performance. You could take a look at this. https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/ On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh abhis...@tetrationanalytics.com wrote: Is it fair to say that Storm stream

Re: Grouping runs of elements in a RDD

2015-06-30 Thread Abhishek R. Singh
could you use a custom partitioner to preserve boundaries such that all related tuples end up on the same partition? On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote: Thanks, Reynold. I still need to handle incomplete groups that fall between partition boundaries. So, I

Re: Grouping runs of elements in a RDD

2015-06-30 Thread Abhishek R. Singh
could you use a custom partitioner to preserve boundaries such that all related tuples end up on the same partition? On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote: Thanks, Reynold. I still need to handle incomplete groups that fall between partition boundaries. So, I

spark sql error with proto/parquet

2015-04-18 Thread Abhishek R. Singh
I have created a bunch of protobuf based parquet files that I want to read/inspect using Spark SQL. However, I am running into exceptions and not able to proceed much further: This succeeds successfully (probably because there is no action yet). I can also printSchema() and count() without any

Re: Dataframes Question

2015-04-18 Thread Abhishek R. Singh
I am no expert myself, but from what I understand DataFrame is grandfathering SchemaRDD. This was done for API stability as spark sql matured out of alpha as part of 1.3.0 release. It is forward looking and brings (dataframe like) syntax that was not available with the older schema RDD. On