Is it possible to set state backend as RocksDB without asking it to checkpoint?
We are trying to do application level checkpointing (since it gives us better
flexibility to upgrade our flink pipeline and also restore state in a
application specific upgrade friendly way). So we don’t really need
)
at
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.initializeForJob(RocksDBStateBackend.java:304)
... 6 more
> On Jan 23, 2017, at 8:20 AM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com> wrote:
>
> Is there a limit on how many DataStreams can be defined in a st
combined = combined.union(streams.get(i));
}
combined.print().setParallelism(1);
} else { // die parallel
for (int i = 1; i < nParts; i++) {
streams.get(i).print();
}
}
> On Jan 23, 2017, at 6:14 AM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com> wrote:
>
I even make it 10 minutes:
akka.client.timeout: 600s
But doesn’t feel like it is taking effect. It still comes out at about the same
time with the same error.
-Abhishek-
> On Jan 23, 2017, at 6:04 AM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com> wrote:
>
> ye
yes, I had increased it to 5 minutes. It just sits there and bails out again.
> On Jan 23, 2017, at 1:47 AM, Jonas wrote:
>
> The exception says that
>
> Did you already try that?
>
>
>
> --
> View this message in context:
>
I am using version 1.1.4 (latest stable)
> On Jan 23, 2017, at 12:41 AM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com> wrote:
>
> I am trying to construct a topology like this (shown for parallelism of 4) -
> basically n parallel windowed processing sub-pipelin
I am trying to construct a topology like this (shown for parallelism of 4) -
basically n parallel windowed processing sub-pipelines with single source and
single sink:
I am getting the following failure (if I go beyond 28 - found empirically
using binary search). There is nothing in the job
tory.
>
> On Wed, Dec 14, 2016 at 1:20 AM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com <mailto:abhis...@tetrationanalytics.com>>
> wrote:
> Not sure how to go from here. How do I create a PR for this?
>
> $ git branch
> * doc-checkpoint-notify
see the `org.apache.flink.runtime.state.CheckpointListener` interface.
## State Checkpoints in Iterative Jobs
> On Dec 12, 2016, at 3:11 PM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com> wrote:
>
> https://issues.apache.org/jira/browse/FLINK-5323
> <https:
ack. Love the project
> !!
>
> Thanks for the awesomeness.
>
>
> On Mon, Dec 12, 2016 at 12:29 PM Stephan Ewen <se...@apache.org
> <mailto:se...@apache.org>> wrote:
> Thanks for reporting this.
> It would be awesome if you could file a JIRA or a pull req
import
org.apache.flink.runtime.state.CheckpointListener;
-Abhishek-
> On Dec 9, 2016, at 4:30 PM, Abhishek R. Singh
> <abhis...@tetrationanalytics.com> wrote:
>
> I can’t seem to find CheckpointNotifier. Appreciate help !
>
> CheckpointNotifier is not a member of package
> org.apac
I can’t seem to find CheckpointNotifier. Appreciate help !
CheckpointNotifier is not a member of package
org.apache.flink.streaming.api.checkpoint
From my pom.xml:
org.apache.flink
flink-scala_2.11
1.1.3
Its not so much about latency actually. The bigger rub for me is that the state
has to be reshuffled every micro/mini-batch (unless I am not understanding it
right - spark 2.0 state model i.e.).
Operator model avoids it by preserving state locality. Event time processing
and state purging are
g something?
>
>> On 19 May 2016, at 20:36, Abhishek R. Singh <abhis...@tetrationanalytics.com
>> <mailto:abhis...@tetrationanalytics.com>> wrote:
>>
>> I was wondering how checkpoints can be async? Because your state is
>> constantly mutating. You
I was wondering how checkpoints can be async? Because your state is constantly
mutating. You probably need versioned state, or immutable data structs?
-Abhishek-
> On May 19, 2016, at 11:14 AM, Paris Carbone wrote:
>
> Hi Stavros,
>
> Currently, rollback failure recovery in
Hi,
Can we define custom sources in link? Control the barriers and (thus)
checkpoints at good watermark points?
-Abhishek-
You had:
RDD.reduceByKey((x,y) = x+y)
RDD.take(3)
Maybe try:
rdd2 = RDD.reduceByKey((x,y) = x+y)
rdd2.take(3)
-Abhishek-
On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com wrote:
HI All,
I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] =
A workaround would be to have multiple passes on the RDD and each pass write
its own output?
Or in a foreachPartition do it in a single pass (open up multiple files per
partition to write out)?
-Abhishek-
On Aug 14, 2015, at 7:56 AM, Silas Davis si...@silasdavis.net wrote:
Would it be right
you would get better response on Tachyon's mailing list:
https://groups.google.com/forum/?fromgroups#!forum/tachyon-users
Cheers
On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
Do people use Tachyon in production, or is it experimental grade
Do people use Tachyon in production, or is it experimental grade still?
Regards,
Abhishek
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
I don't know if (your assertion/expectation that) workers will process things
(multiple partitions) in parallel is really valid. Or if having more partitions
than workers will necessarily help (unless you are memory bound - so partitions
is essentially helping your work size rather than
Is it fair to say that Storm stream processing is completely in memory, whereas
spark streaming would take a disk hit because of how shuffle works?
Does spark streaming try to avoid disk usage out of the box?
-Abhishek-
-
To
comparison for end-to-end performance. You could
take a look at this.
https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/
On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
Is it fair to say that Storm stream
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote:
Thanks, Reynold. I still need to handle incomplete groups that fall between
partition boundaries. So, I
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote:
Thanks, Reynold. I still need to handle incomplete groups that fall between
partition boundaries. So, I
I have created a bunch of protobuf based parquet files that I want to
read/inspect using Spark SQL. However, I am running into exceptions and not
able to proceed much further:
This succeeds successfully (probably because there is no action yet). I can
also printSchema() and count() without any
I am no expert myself, but from what I understand DataFrame is grandfathering
SchemaRDD. This was done for API stability as spark sql matured out of alpha as
part of 1.3.0 release.
It is forward looking and brings (dataframe like) syntax that was not available
with the older schema RDD.
On
27 matches
Mail list logo